SO
Job Description
Key Responsibilities:
- Fine-tune and deploy LLMs (e.g., GPT, LLaMA, Mistral) using frameworks like Hugging Face Transformers and LangChain.
- Build and optimize RAG pipelines with vector databases (e.g., Pinecone, FAISS, Weaviate).
- Engineer prompts for structured and reliable outputs across diverse use cases such as chatbots, summarization tools, and coding assistants.
- Implement scalable inference pipelines; optimize for latency, throughput, and cost using quantization, distillation, and other model optimization techniques.
- Collaborate with product, design, and engineering teams to integrate generative AI capabilities into user-facing features.
- Monitor and improve model performance, accuracy, safety, and compliance in production.
- Ensure responsible AI practices through content filtering, output sanitization, and ethical deployment.
Required Skills:
- Proficiency in Python and familiarity with modern machine learning tools and libraries.
- Hands-on experience with LLM development using Hugging Face Transformers, LangChain, or LlamaIndex.
- Experience building and deploying RAG pipelines, including managing embeddings and vector search.
- Strong understanding of transformer architectures, tokenization, and prompt engineering techniques.
- Comfortable working with LLM APIs (e.g., OpenAI, Anthropic, Cohere) and serving models with FastAPI, Flask, or similar frameworks.
- Familiarity with deploying ML systems using Docker, Kubernetes, and cloud services (AWS, GCP, Azure).
- Experience with model evaluation, logging, and inference pipeline troubleshooting.
Nice to Have:
- Exposure to multimodal models (e.g., text-to-image, video generation, TTS).
- Experience with reinforcement learning from human feedback (RLHF) or alignment techniques.
- Familiarity with open-source LLMs (e.g., Mistral, Mixtral, LLaMA, Falcon) and optimization tools (LoRA, quantization, PEFT).
- Knowledge of LangChain agents, tool integration, and memory management.
- Contributions to open-source GenAI projects, public demos, or blogs in the generative AI space.
- Basic proficiency in frontend development (e.g., React, Next.js) for rapid prototyping.