Data Scientist Required for IRS 990 Analysis and NLP in Public Policy Research
We are looking for a computationally strong data expert for a fixed-duration research engagement for 3 weeks for an empirical project in the nonprofit–government policy space.
The project involves large-scale administrative data, open government datasets, and advanced NLP, with the goal of producing a results-ready analysis suitable for submission to a top-tier journal in nonprofit studies or public administration.
Because the project contains a novel methodological contribution, a nondisclosure agreement (NDA) will be required before full project details, data architecture, and analytic framework are shared.
However, the broad technical skillset needed is listed below so that interested candidates may assess fit.
Required Expertise (General Outline)
Data Engineering & Administrative Data
Parsing and transforming large, semi-structured or unstructured public datasets (e.g., XML/JSON)
Building reproducible Python ETL pipelines
Managing data at scale (millions of text records)
Natural Language Processing
Experience with at least one of the following:
Topic modeling (LDA and/or embedding-based methods)
Clustering techniques such as UMAP + HDBSCAN
Text classification or thematic modeling workflows
Working with transformer-based embedding models
Statistical Modeling
Familiar with:
Panel data regression models
Time-series alignment or co-movement analysis
Similarity metrics (e.g., cosine, correlation)
Robustness testing and model validation
Research Communication
Ability to document methods explicitly and clearly
Experience drafting or co-drafting Methods and Results sections for academic publication
Comfort preparing reproducible files (notebooks, GitHub structure, workflow notes)
Engagement Details
Duration: 3 weeks (full-time or near full-time)
Nature of work: Analytic + computational + methodological documentation
Output: Clean datasets, documented code, analytic results, and draft text suitable for journal submission
Confidentiality: NDA required before project description, data schema, or methodology is disclosed
Ideal Candidate
Quantitative / computational postdocs
Researchers in public policy, computational social science, economics, political science, sociology, or information science
Applied data scientists with experience in text analytics or public-sector data
Anyone excited to work on a well-scoped, well-funded research project with real publication potential
s
Apply tot his job
Apply To this Job