Senior LLM Engineer
Tasks:
• As a Senior Engineer, you will design and ship agentic AI systems that plan, call tools, and execute reliably inside production workflows. You’ll own the end-to-end delivery of GenAI capabilities—from model adaptation and retrieval to orchestration, evaluation, and operational excellence.
• Build agentic systems: design supervisor/planner/executor patterns, routing, memory/context strategies, tool/function calling, and robust failure handling.
• LLM adaptation & deployment: fine-tune or parameter-efficiently adapt open-source LLMs; optimize inference (latency/cost) and ship safely to production.
• Retrieval-augmented generation (RAG): implement embedding, retrieval, re-ranking, and grounding patterns; optimize for quality, speed, and cost.
• Structured and reliable generation: enforce schemas/structured outputs, guardrails, and post-processing; reduce hallucinations and brittleness.
• Evaluation & quality: build automated evaluation harnesses for agents/LLMs (offline benchmarks + online monitoring), regression tests, and prompt/model versioning.
• Production engineering: ship containerized services and APIs; implement CI/CD, observability, and reliability practices (SLOs, alerting, incident readiness).
• Cross-functional delivery: collaborate with product, platform, and data teams to integrate GenAI features into user-facing and internal workflows; mentor others.
Requirements:
• 5+ years building production ML/AI systems; 2+ years at senior/lead level.
• Strong Python engineering (testing, packaging, code quality, performance profiling).
• Hands-on experience with LLMs and agentic AI in real systems (tool calling, orchestration, workflow integration).
• Experience adapting LLMs (LoRA/QLoRA/PEFT or equivalent) and evaluating quality/safety.
• Experience implementing RAG and operating retrieval components in production.
• Strong MLOps fundamentals: containers, CI/CD, model/service versioning, monitoring.
• API/service development: REST/gRPC, auth, rate limits, error handling, resilience patterns.
• Comfortable operating in cloud environments (AWS/GCP/Azure) with production constraints.
Benefits:
• Inference optimization: quantization, batching/caching, GPU serving (e.g., vLLM/TGI or similar).
• Agent safety engineering: prompt injection defenses, tool security, sandboxing, red teaming.
• Advanced evaluation: LLM-as-judge, preference testing, rubric-based grading, A/B testing.
• Vector database operations/tuning and retrieval performance engineering.
• Event-driven or workflow orchestration experience (e.g., Temporal/Airflow/n8n equivalents).
• Multi-lingual GenAI experience and robust internationalization practices.
Apply tot his job
Apply To this Job