Software engineer interested in systems, infrastructure, and ML inference. Currently at Meta working on Core Ads Delivery, previously at Uber building payments infrastructure and real-time data pipelines. Co-founder of Monosemantic, an inference-first open-source speech serving platform. IIT Madras, Electrical Engineering.
Profiling a Production Streaming ASR Pipeline
We were optimizing our streaming RNN-T server for latency. Profiling revealed where the time actually went — and where custom Triton kernels helped, where they didn't.
Read more →Open Source Speech Models Need a Specialist
On the gap between open-source ASR model releases and production-ready inference. Why a dedicated optimization and serving layer is needed — and what we learned deploying Nemotron on GPU infrastructure.
Read more →Learnings from a 2M QPS Ads Endpoint
Optimized the ads retrieval phase for a ~2M QPS endpoint on Audience Network. Built a better throttling model based on expected serving cost and moved ad filtering before ranking, saving 7% CPU.
Read more →Horizon Events — VR Experiences at Meta
Developed core features for the Horizon VR game engine powering immersive concert experiences for up to 1,000 concurrent users. Worked on the C++ rendering engine and TypeScript orchestration layer.
Read more →Real-Time Streaming Pipelines — Uber
Built real-time data pipelines using Apache Flink processing ~300K messages/sec across 15 Kafka pipelines. Migrated JSON to Protobuf and merged redundant streams, saving ~$900K/year.
Read more →Building a Payment Platform — Uber
Designed and built a retain-and-remit payment system handling tax withholding and remittance across multiple countries. Generic, extensible architecture that reduced operations workload from 3 days to hours.
Read more →