Golang Developer with DevOps/LLM Experience - Remote / Telecommute
Job Description:
Required Skills:
• Proficiency in Golang for building scalable and performant backend services.
• Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
• Experience working with Large Language Models (LLMs), particularly hosting them to run inference.
• Strong verbal and written communication skills.
• Candidates job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation.
• Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations.
• Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT.
• Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max.
• Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc.
• Experience with client and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, Client, etc.
• Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc.
• Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on.
• Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations.
• Build tooling and observability to monitor system health, and build auto tuning capabilities.
• Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts.
• Build native cross platform inference support across NVIDIA and client GPUs for a variety of model architectures.
• Contribute to open source inference engines to make them perform better on DigitalOcean cloud.
Apply tot his job
Apply To this Job