Back to Jobs

Senior Data Engineer, Platform & Pipelines

Remote, USA Full-time Posted 2025-11-24
Job Description: • Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster). • Identify, catalog, and integrate internal and external data sources used across research efforts. • Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS. • Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying. • Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases). • Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions. • Contribute to backend services and APIs that expose integrated data to internal tools and applications. • Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM- or agent-based interactions with data. • Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability. • Write clean, tested, and well-documented Python code that meets production software engineering standards. • Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment. Requirements: • BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred. • 4+ years of experience in production data engineering or software engineering. • Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation. • Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts. • Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing. • Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable). • Experience building or contributing to backend services (e.g., FastAPI or similar frameworks). • Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda). • Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability. • Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design. • Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness. • Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment. Benefits: • Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents. • Free testing for Natera employees and their immediate families in addition to fertility care benefits. • Pregnancy and baby bonding leave. • 401k benefits. • Commuter benefits. • Generous employee referral program! Apply tot his job Apply To this Job

Similar Jobs