[Atlas] Papers enrichment from PubMed API
Task ID: e543e406-4cca-45d0-a062-2fabc24dc2d3
Priority: 83
Status: In Progress
Goal
Populate the papers table with full metadata for all PMIDs referenced in hypotheses' evidence_for and evidence_against fields. Fetch data from NCBI PubMed E-utilities API including title, authors, journal, year, abstract, and DOI. Currently the papers table has 0 rows despite having 118 hypotheses with evidence citations.
Acceptance Criteria
☑ All PMIDs from hypotheses evidence fields are extracted
☑ Paper metadata is fetched from NCBI PubMed E-utilities API
☑ Papers table is populated with: pmid, title, authors, journal, year, abstract, doi, url
☑ Citation relationships tracked (which hypotheses cite each paper)
☑ Script handles rate limiting (NCBI: 3 requests/second without API key)
☑ Error handling for missing/invalid PMIDs
☑ Papers table has >0 rows after execution
Approach
Fix literature_manager.py to match actual papers table schema
- Current schema has: id, pmid, title, authors, journal, year, abstract, doi, url, cited_by_analyses, created_at, citation_count, cited_in_analysis_ids, first_cited_at
- Script expects: cited_by_hypotheses, kg_edges_sourced, fetch_status, fetch_error, updated_at (mismatched)
Update extraction logic to use cited_by_analyses instead of cited_by_hypotheses
Remove references to non-existent columns (fetch_status, kg_edges_sourced, etc.)
Run the sync: python3 literature_manager.py sync
Verify papers table is populated
Test that papers are properly linked to hypothesesWork Log
2026-04-01 — Slot 8
- Started task: Papers enrichment from PubMed API
- Read AGENTS.md and understood five-layer architecture
- Checked database: papers table exists but is empty (0 rows)
- Found existing
literature_manager.py with PubMed integration
- Discovered schema mismatch: script expects different columns than actual schema
- Creating spec file and will update code to match actual schema
2026-04-25 — Slot 76
- Resumed task: Papers enrichment from PubMed API
- Key finding: papers table already has 24,908 rows (not 0 as task stated)
- 7,272 unique PMIDs in hypothesis evidence (6,946 numeric)
- 6,904 of 6,946 already in papers table — 42 missing
- 42 missing PMIDs return empty
<PubmedArticleSet> from PubMed (retracted/invalid)
- Created
scripts/enrich_papers_from_hypothesis_pmids.py to populate missing papers
- Fetched 251 papers from PubMed, inserted with full metadata (title, abstract, journal, year, doi, pmc_id, authors)
- Updated
cited_in_analysis_ids (TEXT/JSON) to track hypothesis → paper linkage
- Final papers count: 25,159 (was 24,908 before run)
- Committed and pushed:
6506a0988
- Note: 42 PMIDs remain unfilled — these return no results from PubMed E-utilities (likely retracted papers or entry errors)