About Me

2009–2013
Gujarat Technological University
B.S. Computer Science
2014–2016
Illinois Institute of Technology
M.S. Computer Science
2016
eContext.ai
Python Engineer · Internship
2017–2018
eContext.ai
Machine Learning Engineer
2019–2023
eContext.ai
Senior Machine Learning Engineer
2023–2024
84.51°
Senior Data Scientist
2024–now
84.51°
Lead Research Scientist

ML engineer and technical lead with 10+ years building production GenAI and ML systems at scale. Own the full GenAI stack: from fine-tuning and alignment to embedding infrastructure and production serving. Shipped systems handling 50M+ monthly queries, driving $108M+ in attributable revenue, and operating inside a $150B+ retail ecosystem. Move from research artifact to production system end-to-end.


Core Expertise

Fine-tuning & Alignment: PEFT/LoRA fine-tuning on domain-specific corpora, quantization (GPTQ, AWQ, GGUF), safety evaluation, red-teaming, and multi-intent SLM development. Hands-on from dataset curation through deployment.

Semantic Search & Embeddings: Bi-encoder and cross-encoder architectures, ANN indexing, hybrid sparse-dense retrieval pipelines, and embedding evaluation frameworks. Experience taking dense retrieval from prototype to 50M+ query/month production traffic.

RAG Systems: Chunking strategies, retrieval reranking, structured output generation, and hallucination evaluation. Built RAG pipelines backed by Azure OpenAI with custom retrieval layers and output validation.

Production Serving: vLLM, Triton Inference Server, OpenAI-compatible API deployment, latency optimization, model compression, and controlled A/B rollouts. Optimized GPU inference to hit P99 <200ms at scale.

Foundation Models: Pre-training transformer-based models from scratch on large-scale corpora: custom tokenization, training objective design, distributed training with DeepSpeed/PyTorch FSDP, and data curriculum scheduling. Applied to behavioral sequence modeling and representation learning, with fine-tuning for downstream personalization, ranking, and prediction tasks.


Production Systems

  • Large-scale semantic search: End-to-end dense retrieval system on a top-10 US e-commerce platform. Bi-encoder + cross-encoder pipeline with ANN indexing and GPU inference. 50M+ queries/month at P99 <200ms; +5% conversion lift over keyword baselines; $108M+ attributable revenue.

  • Domain-adapted SLM: PEFT/LoRA fine-tuning and quantization for multi-intent query understanding. Includes safety eval and red-teaming. Deployed via Triton + vLLM as an OpenAI-compatible API.

  • RAG pipeline: LLM-driven system using Azure OpenAI for natural language → structured output generation. Custom retrieval layer with output validation for structured task domains.

  • Hierarchical taxonomy classifier: Hybrid LSTM/CNN predicting across 500K+ categories, deployed via TensorFlow Serving in high-throughput production traffic.


Technical Skills

LLMs & GenAI: Fine-tuning, PEFT/LoRA, quantization (GPTQ/AWQ/GGUF), RAG systems, semantic search, embedding models, vLLM, Triton Inference Server, vector databases, Transformers

ML Systems: PyTorch, TensorFlow, DeepSpeed, distributed training, model serving at scale, A/B testing, pipeline automation

Infrastructure: Python, SQL, Docker, AWS, Azure, REST/gRPC, Elasticsearch, Redis, Git, CI/CD


Certifications

AWS Certified Machine Learning Specialty    TensorFlow Developer Certificate


LinkedIn