About Me

2009–2013

Gujarat Technological University

B.S. Computer Science

2014–2016

Illinois Institute of Technology

M.S. Computer Science

2016

eContext.ai

Python Engineer · Internship

2017–2018

eContext.ai

Machine Learning Engineer

2019–2023

eContext.ai

Senior Machine Learning Engineer

2023–2024

84.51°

Senior Data Scientist

2024–now

84.51°

Lead Research Scientist

ML engineer and technical lead with 10+ years building production GenAI and ML systems at scale. Own the full GenAI stack: from fine-tuning and alignment to embedding infrastructure and production serving. Shipped systems handling 50M+ monthly queries, driving $108M+ in attributable revenue, and operating inside a $150B+ retail ecosystem. Move from research artifact to production system end-to-end.

Core Expertise

Fine-tuning & Alignment: PEFT/LoRA fine-tuning on domain-specific corpora, quantization (GPTQ, AWQ, GGUF), safety evaluation, red-teaming, and multi-intent SLM development. Hands-on from dataset curation through deployment.

Semantic Search & Embeddings: Bi-encoder and cross-encoder architectures, ANN indexing, hybrid sparse-dense retrieval pipelines, and embedding evaluation frameworks. Experience taking dense retrieval from prototype to 50M+ query/month production traffic.

RAG Systems: Chunking strategies, retrieval reranking, structured output generation, and hallucination evaluation. Built RAG pipelines backed by Azure OpenAI with custom retrieval layers and output validation.

Production Serving: vLLM, Triton Inference Server, OpenAI-compatible API deployment, latency optimization, model compression, and controlled A/B rollouts. Optimized GPU inference to hit P99 <200ms at scale.

Foundation Models: Pre-training transformer-based models from scratch on large-scale corpora: custom tokenization, training objective design, distributed training with DeepSpeed/PyTorch FSDP, and data curriculum scheduling. Applied to behavioral sequence modeling and representation learning, with fine-tuning for downstream personalization, ranking, and prediction tasks.

Production Systems

Large-scale semantic search: End-to-end dense retrieval system on a top-10 US e-commerce platform. Bi-encoder + cross-encoder pipeline with ANN indexing and GPU inference. 50M+ queries/month at P99 <200ms; +5% conversion lift over keyword baselines; $108M+ attributable revenue.
Domain-adapted SLM: PEFT/LoRA fine-tuning and quantization for multi-intent query understanding. Includes safety eval and red-teaming. Deployed via Triton + vLLM as an OpenAI-compatible API.
RAG pipeline: LLM-driven system using Azure OpenAI for natural language → structured output generation. Custom retrieval layer with output validation for structured task domains.
Hierarchical taxonomy classifier: Hybrid LSTM/CNN predicting across 500K+ categories, deployed via TensorFlow Serving in high-throughput production traffic.

Technical Skills

LLMs & GenAI: Fine-tuning, PEFT/LoRA, quantization (GPTQ/AWQ/GGUF), RAG systems, semantic search, embedding models, vLLM, Triton Inference Server, vector databases, Transformers

ML Systems: PyTorch, TensorFlow, DeepSpeed, distributed training, model serving at scale, A/B testing, pipeline automation

Infrastructure: Python, SQL, Docker, AWS, Azure, REST/gRPC, Elasticsearch, Redis, Git, CI/CD

Snehal Patel

About Me

Core Expertise

Production Systems

Technical Skills

Certifications