Python Automation Engineer – Multi-Source Scraping & Data Pipeline Build
We are looking for a Python automation engineer to build a fully automated data pipeline that gathers AI company data from multiple sources (APIs + web scraping), deduplicates it intelligently, and outputs clean structured data to Airtable or Notion on a weekly schedule.
You must have proven experience building production-grade scrapers, not basic scripts.
Required:
Strong Python (Scrapy, BeautifulSoup, requests)
API integrations (REST, authenticated APIs)
Experience automating recurring pipelines (cron jobs, scheduled tasks, etc.)
Data cleaning, deduplication logic, CSV/JSON handling
Ability to write clean, well-structured code
Nice to have (not required):
Selenium or Playwright
Experience with Airtable/Notion API
Experience with LLMs for data enrichment
Deliverables:
Scrapers for multiple AI-related sources (APIs + websites)
Deduplication + merging logic across sources
Weekly automated update pipeline
Output to Airtable/Notion in structured columns
Clear documentation so we can maintain it long-term
This project should take 2–3 weeks to build, with optional monthly maintenance.
If you’ve built multi-source scrapers before, please apply with examples.
Apply tot his job
Apply To this Job