Task ID: c5bbaa6b-75a8-40a6-a9d3-3efa6ebb396c Layer: Forge Priority: P88
pubmed_enrichment.py does one-shot backfills but doesn't track which papers have already been seen, can't find new papers since the last run, and doesn't append to existing evidence.pubmed_update_pipeline.py — a recurring pipeline that:
pubmed_update_log tablepubmed_update_log table:
sqlite3 PostgreSQL "SELECT id, title, json_array_length(evidence_for) FROM hypotheses ORDER BY composite_score DESC LIMIT 5"scidex/agora/pubmed_update_pipeline.py (651 lines, PostgreSQL via get_db() shim) exists on origin/main. Backward-compat shim at pubmed_update_pipeline.py present. pubmed_update_log table confirmed in PostgreSQL DB (PRIMARY KEY on hypothesis_id). Ran live test: 3 hypotheses processed, 2 new papers added to TREM2 hypothesis evidence_for. Pipeline handles date filtering, deduplication, and incremental updates as designed.Evidence: scidex/agora/pubmed_update_pipeline.py present on origin/main with 651 lines implementing the full PostgreSQL-backed incremental PubMed pipeline. pubmed_update_log table exists with PK on hypothesis_id. Live run verified: 3 hypotheses processed, 2 papers added to TREM2 evidence.
Commit that landed the fix: Multiple squash-merges landed this; files are present on origin/main HEAD.
Summary: Automated PubMed update pipeline is implemented, tested, and operational.