AI Scientist, Vision AI
Job Description:
• Research, design, and implement cutting-edge computer vision models for tasks such as image classification, object detection, semantic/instance segmentation, and video understanding.
• Develop and optimize generative vision models, including text-to-image, text-to-video, and image-to-video approaches.
• Train, fine-tune, and evaluate large-scale vision foundation models, adapting them to healthcare-specific applications.
• Collaborate with AI scientists, engineers, and product teams to integrate vision AI capabilities into Artisight’s platform.
• Stay at the forefront of vision AI and multimodal learning research, bringing innovations from the research community into production applications.
• Document and share research outcomes through technical reports, internal presentations, and where appropriate, external publications.
• Work at the intersection of research and application — designing novel vision models and deploying these technologies into real-world healthcare environments.
Requirements:
• M.S. or Ph.D. in computer science, electrical engineering, applied AI, machine learning, or related discipline.
• Demonstrated expertise in computer vision research, evidenced by open-source contributions or peer-reviewed publications (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR).
• Hands-on experience with one or more of: Image classification and object detection; Image segmentation (semantic, instance, or panoptic); Video classification and temporal modeling; Text-to-Image / Text-to-Video generation; Image-to-Video or video synthesis.
• Strong knowledge of deep learning methods (transformers, diffusion models, CNNs, self-supervised learning, multimodal architectures).
• Proficiency in frameworks such as PyTorch or TensorFlow, with experience in large-scale vision model training.
• Familiarity with deployment tools such as ONNX, NVIDIA Triton, or similar inference platforms.
• Strong problem-solving skills and the ability to clearly communicate research insights across disciplines.
• Nice to haves: Experience with multimodal learning (vision + audio + text); Familiarity with 3D vision, medical imaging, or spatiotemporal models; Experience with real-time video analysis and low-latency deployment; Contributions to open-source vision projects (e.g., Detectron2, MMDetection, Segment Anything, Stable Diffusion, OpenMMLab).
Benefits:
Apply tot his job
Apply To this Job