[Remote] Research Scientist / Engineer – Multimodal Capabilities
Note: The job is a remote job and is open to candidates in USA. Luma AI is dedicated to building multimodal AI to enhance human capabilities. The role involves conducting pioneering research to define the future capabilities of multimodal models, designing experiments, and collaborating with research teams to translate findings into product experiences.
Responsibilities
• Research and Define the next frontier of multimodal capabilities, identifying key gaps in our current models and designing the experiments to solve them
• Design and Execute novel experiments, datasets, and methodologies to systematically improve model performance across vision, audio, and language
• Develop and Pioneer new evaluation frameworks and benchmarking approaches to precisely measure novel multimodal behaviors and capabilities
• Collaborate Deeply with other research teams to translate your findings into our core training recipes and unlock new product experiences
• Build and Prototype compelling demonstrations that showcase the groundbreaking multimodal capabilities you have unlocked
Skills
• You have a PhD or equivalent research experience in a field related to AI, Machine Learning, or Computer Science
• You have strong programming skills in Python and deep, hands-on experience with PyTorch
• You have a proven track record of working with multimodal data pipelines and curating large-scale datasets for research
• You possess a deep, fundamental understanding of at least one of the core modalities: computer vision, audio processing, or natural language processing
• You thrive on tackling the most ambitious, open-ended research challenges in a fast-paced, collaborative environment
• Direct expertise working with complex, interleaved multimodal data (video, audio, text)
• Hands-on experience training or fine-tuning Vision Language Models (VLMs), Audio Language Models, or large-scale generative video models from scratch
• A strong publication record in top-tier AI conferences (e.g., NeurIPS, ICML, CVPR, ICLR)
• Experience leading ambitious, open-ended research projects from ideation to tangible results
Company Overview
• Luma AI develops tools that let users generate photorealistic images and videos from text, image, or video prompts. It was founded in 2021, and is headquartered in Palo Alto, California, USA, with a workforce of 11-50 employees. Its website is https://lumalabs.ai.
Company H1B Sponsorship
• Luma AI has a track record of offering H1B sponsorships, with 10 in 2025, 3 in 2024. Please note that this does not guarantee sponsorship for this specific role.
Apply tot his job
Apply To this Job