[Remote] Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026 Start (BS/MS)
Note: The job is a remote job and is open to candidates in USA. ByteDance is a company dedicated to pioneering advanced AI foundation models. The Software Engineer in ML Systems role focuses on researching and developing machine learning systems, managing cross-layer optimization, and improving efficiency for large-scale distributed training jobs.
Responsibilities
- Research and develop our machine learning systems, including heterogeneous computing architecture, management, and monitoring
- Deploy machine learning systems, distributed task scheduling, machine learning training
- Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC)
- Implement both general purpose training framework features and model specific optimizations (e.g. LLM, diffusions)
- Improve efficiency and stability for extremely large scale distributed training jobs
Skills
- Master distributed, parallel computing principles; know the recent advances in computing, storage, networking, and hardware technologies
- Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax
- Have basic understanding of how GPU and/or ASIC works
- Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python
- GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs)
- Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD
- AI compiler stacks such as torch.fx, XLA and MLIR
- Large scale data processing and parallel computing
- Experiences in designing and operating large scale systems in cloud computing or machine learning
- Experiences in in-depth CUDA programming and performance tuning (cutlass, triton)
Benefits
- Medical, dental, and vision insurance
- 401(k) savings plan with company match
- Paid parental leave
- Short-term and long-term disability coverage
- Life insurance
- Wellbeing benefits
- 10 paid holidays per year
- 10 paid sick days per year
- 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure)
Company Overview
Company H1B Sponsorship
Apply To This Job