Jashwanth Pedapudi
LinkedIn

Software engineer interested in systems, infrastructure, and ML inference. Currently at Meta working on Core Ads Delivery, previously at Uber building payments infrastructure and real-time data pipelines. Co-founder of Monosemantic, an inference-first open-source speech serving platform. IIT Madras, Electrical Engineering.

Apr 2026

Building a Voice AI Agent From Scratch

What it takes to go from an open-source speech model to a production voice agent. Deployed Nemotron on Baseten, wired into LiveKit, then built 7000 lines of streaming runtime to make it actually work.

Read more →
Apr 2026

Profiling a Production Streaming ASR Pipeline

We were optimizing our streaming RNN-T server for latency. Profiling revealed where the time actually went — and where custom Triton kernels helped, where they didn't.

Read more →
Mar 2026

Open Source Speech Models Need a Specialist

On the gap between open-source ASR model releases and production-ready inference. Why a dedicated optimization and serving layer is needed — and what we learned deploying Nemotron on GPU infrastructure.

Read more →
Feb 2026

Learnings from a 2M QPS Ads Endpoint

Optimized the ads retrieval phase for a ~2M QPS endpoint on Audience Network. Built a better throttling model based on expected serving cost and moved ad filtering before ranking, saving 7% CPU.

Read more →
Dec 2025

Horizon Events in Meta Quest

Built the Events Arena from scratch, a 0-to-1 immersive VR venue launched by Zuckerberg at Connect 2025. First-of-its-kind live 3D event experience for up to 1,000 people on Quest, with stereo video, spatial audio, and stadium-scale presence.

Read more →
Aug 2023

Real-Time Streaming Pipelines — Uber

Built real-time data pipelines using Apache Flink processing ~300K messages/sec across 15 Kafka pipelines. Migrated JSON to Protobuf and merged redundant streams, saving ~$900K/year.

Read more →
Feb 2023

Building a Payment Platform — Uber

Designed and built a retain-and-remit payment system handling tax withholding and remittance across multiple countries. Generic, extensible architecture that reduced operations workload from 3 days to hours.

Read more →