Senior Data Engineer, Platform & Pipelines
Job Description:
• Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster).
• Identify, catalog, and integrate internal and external data sources used across research efforts.
• Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS.
• Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying.
• Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases).
• Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions.
• Contribute to backend services and APIs that expose integrated data to internal tools and applications.
• Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM- or agent-based interactions with data.
• Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability.
• Write clean, tested, and well-documented Python code that meets production software engineering standards.
• Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment.
Requirements:
• BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred.
• 4+ years of experience in production data engineering or software engineering.
• Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation.
• Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts.
• Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing.
• Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable).
• Experience building or contributing to backend services (e.g., FastAPI or similar frameworks).
• Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda).
• Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability.
• Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design.
• Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness.
• Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment.
Benefits:
• Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents.
• Free testing for Natera employees and their immediate families in addition to fertility care benefits.
• Pregnancy and baby bonding leave.
• 401k benefits.
• Commuter benefits.
• Generous employee referral program!
Apply tot his job
Apply To this Job