Spec: [Forge] Build Automated PubMed Update Pipeline for Hypothesis Evidence

Task ID: c5bbaa6b-75a8-40a6-a9d3-3efa6ebb396c Layer: Forge Priority: P88

Problem

Hypotheses need fresh literature evidence. The existing pubmed_enrichment.py does one-shot backfills but doesn't track which papers have already been seen, can't find new papers since the last run, and doesn't append to existing evidence.

Solution

Build pubmed_update_pipeline.py — a recurring pipeline that:

Selects top N hypotheses by composite_score

For each, searches PubMed with a date filter (only papers since last check)

Deduplicates against existing PMIDs already in evidence_for/evidence_against

Appends new citations to evidence_for/evidence_against

Tracks last-checked timestamps in a pubmed_update_log table

Can be run via CLI or cron

DB Migration

Add pubmed_update_log table:

hypothesis_id TEXT (FK)
last_checked_at TEXT
papers_found INTEGER
papers_added INTEGER

Testing

Run with --dry-run to verify search queries without writing
Verify with sqlite3 PostgreSQL "SELECT id, title, json_array_length(evidence_for) FROM hypotheses ORDER BY composite_score DESC LIMIT 5"

Work Log

2026-04-02: Started implementation; created pubmed_update_pipeline.py (SQLite)
2026-04-25: Verified already resolved on main. scidex/agora/pubmed_update_pipeline.py (651 lines, PostgreSQL via get_db() shim) exists on origin/main. Backward-compat shim at pubmed_update_pipeline.py present. pubmed_update_log table confirmed in PostgreSQL DB (PRIMARY KEY on hypothesis_id). Ran live test: 3 hypotheses processed, 2 new papers added to TREM2 hypothesis evidence_for. Pipeline handles date filtering, deduplication, and incremental updates as designed.

Already Resolved — 2026-04-25 23:52:00Z

Evidence: scidex/agora/pubmed_update_pipeline.py present on origin/main with 651 lines implementing the full PostgreSQL-backed incremental PubMed pipeline. pubmed_update_log table exists with PK on hypothesis_id. Live run verified: 3 hypotheses processed, 2 papers added to TREM2 evidence.

Commit that landed the fix: Multiple squash-merges landed this; files are present on origin/main HEAD.

Summary: Automated PubMed update pipeline is implemented, tested, and operational.

File: c5bbaa6b_pubmed_update_pipeline_spec.md

Modified: 2026-04-25 23:53

Size: 2.3 KB