Goal
Attach real PubMed-backed evidence to hypotheses whose evidence_for field is empty. This improves scientific grounding and prevents debate, ranking, and market workflows from relying on unsupported claims.
Acceptance Criteria
☐ A concrete batch of hypotheses gains non-empty evidence_for entries
☐ Each evidence entry includes PMID, DOI, or equivalent citation provenance
☐ No hollow placeholder evidence is inserted
☐ Before/after counts are recorded in the task work log
Approach
Select hypotheses with empty evidence_for, prioritizing active and high-impact rows.
Use paper_cache.search_papers or paper_cache.get_paper to find relevant PubMed evidence.
Add concise supporting evidence with citation identifiers and caveats.
Verify the updated evidence fields and remaining backlog count.Dependencies
c488a683-47f - Agora quest
paper_cache PubMed lookup helpers
Dependents
- Hypothesis debates, evidence validators, and Exchange confidence scoring
Work Log
2026-04-20 - Quest engine template
- Created reusable spec for quest-engine generated hypothesis evidence tasks.
2026-04-21 12:28:03Z - Watchdog repair b209ba9b
- Investigated abandoned task
030034d6-752e-4ac9-9935-36489c7ec792.
- Found 43 hypotheses with empty
evidence_for, but all 43 are archived placeholder rows titled [Archived Hypothesis]; there are 0 active/non-placeholder hypotheses requiring PubMed evidence.
- Dry-ran
scripts/add_pubmed_evidence.py --dry-run --limit 5 and confirmed it would query the literal placeholder title and attach the same unrelated PMIDs to archived rows.
- Plan: exclude archived placeholders from quest-engine backlog counts and PubMed backfill selection, store structured evidence objects, and restore the missing tested backfill helper module.
2026-04-21 12:32:30Z - Watchdog repair result
- Updated quest-engine evidence backlog detection to ignore archived placeholder hypotheses.
- Hardened
scripts/add_pubmed_evidence.py so dry runs and live runs ignore archived placeholders and write structured PubMed evidence objects instead of bare PMID arrays.
- Restored
scripts/backfill_evidence_pubmed.py for the existing unit tests and PostgreSQL-aware backfill helper behavior.
- Verified:
python3 scripts/add_pubmed_evidence.py --dry-run --limit 5 reports 0 actionable hypotheses and 43 archived placeholders ignored.
- Verified:
quest_engine.discover_gaps(get_db()) no longer emits the hypothesis-pubmed-evidence gap for the archived placeholder backlog.
- Tested:
pytest -q tests/test_backfill_evidence_pubmed.py -> 18 passed; python3 -m py_compile scripts/add_pubmed_evidence.py scripts/backfill_evidence_pubmed.py quest_engine.py passed.
2026-04-22 13:21:27Z - Verification e967d229
- Verified:
python3 scripts/add_pubmed_evidence.py --limit 5 shows 0 actionable hypotheses needing evidence.
- All 43 empty-evidence rows are
[Archived Hypothesis] placeholders (archived status) — not valid enrichment targets.
- Ran live update on
h-a2b3485737 (CAPN1/CAPN2, score=0.4199, status=proposed): 5 PubMed PMIDs attached with structured evidence ({pmid, doi, claim, source, year, url, strength, caveat}).
- Non-placeholder, non-archived hypotheses with empty evidence_for: 0 (target met).
- Non-placeholder hypotheses with evidence_for populated: 874 (confirmed via
SELECT COUNT(*)).
- Conclusion: task acceptance criteria already met; no additional enrichment needed.
Already Resolved — 2026-04-23 05:00:00Z
- Evidence: commit
5eb210854 merged to main; top 25 hypotheses by composite_score all have non-empty evidence_for (verified via DB query). Sample PMIDs verified via paper_cache.get_paper(): 41491101, 41530860, 41714746, 41804841 — all return real papers. All 43 empty-evidence rows are [Archived Hypothesis] placeholders.
- Task ID:
d02ec580-83c8-4bc0-8495-17a069138c6a
- Acceptance criteria: satisfied — 1123/1166 hypotheses have evidence; the 43 without are archived placeholders correctly excluded.
2026-04-25 19:01:28Z - Live enrichment task:316fc918-80c1-4cff-a85f-72b226d0d18a
- Before: 7 active non-archived hypotheses had empty
evidence_for; 43 archived placeholders ignored.
- Enriched all 7 with PubMed evidence via
scripts/add_pubmed_evidence.py --limit 20.
- Hypotheses updated: h-e7e1f943 (NLRP3/CASP1), h-ee1df336 (GLP1R/BDNF), h-6c83282d (CLDN1/OCLN), h-74777459 (SNCA/HSPA1A), h-2e7eb2ea (TLR4/SNCA), h-f9c6fa3f (AHR/IL10), h-7bb47d7a (TH/AADC).
- Each received 5 structured PubMed entries ({pmid, doi, claim, source, year, url, strength, caveat}).
- After: active non-archived hypotheses with empty
evidence_for = 0; with evidence populated = 1144.
2026-04-25 19:07:25Z - Thin-evidence enrichment task:316fc918-80c1-4cff-a85f-72b226d0d18a
- Extended
scripts/add_pubmed_evidence.py with --thin-evidence N flag to also enrich hypotheses with fewer than N evidence entries (merges without overwriting existing entries).
- Enriched 13 more hypotheses with 1-2 existing evidence entries, bringing total to 20 (task target).
- Hypotheses updated: h-d4ac0303f6 (LRRK2/G2019S), h-fc43140722 (LRRK2/RAB10/microglia), h-26353f7f59 (APOE/SORL1), h-immunity-50f8d4f4 (CD8+/PRF1), h-ae63a5a13c (BRD4/BET), h-dd0fe43949 (LRRK2-Rab10-JIP4), h-aging-hippo-cortex-divergence (CDKN2A), h-immunity-c3bc272f (C1q/TREM2), h-d28c25f278 (ALDH1A1/DOPAL), h-85b51a8f58 (TREM2/TYROBP), h-c9b96e0e3b (RAB12), h-immunity-03dc171e (SPP1+ DAM), h-e8b3b9f971 (BBB/MMP).
- All 13 received 3-5 new structured PubMed entries merged with existing citations.
- Final: 120 hypotheses remain with < 3 evidence entries (no-zero-evidence non-archived hypotheses remain).
2026-04-25 20:05:00Z - Epigenetics hypothesis enrichment task:316fc918-80c1-4cff-a85f-72b226d0d18a
- Before: 7 active non-archived hypotheses had empty
evidence_for (all epigenetics-related).
- Ran
scripts/add_pubmed_evidence.py --limit 20 — 6/7 enriched automatically.
- For h-1df5ba79 (Bivalent Domain Resolution Failure at Neurodevelopment Genes), auto-search returned no results; manually added 5 targeted PMIDs (28793256, 25250711, 23379639, 31564637, 35640156) covering bivalent H3K4me3/H3K27me3 chromatin dynamics.
- Hypotheses updated: h-3a8f13ac (SEP), h-cee6b095 (REST Complex), h-96795760 (H3K9me3), h-5c9b3fe9 (Polycomb-Trithorax), h-00073ccb (DNA Methylation Clock), h-1df5ba79 (Bivalent Domain), h-8eb6be5e (Mitochondrial-Nuclear Epigenetic).
- Each received 5 structured PubMed entries ({pmid, doi, claim, source, year, url, strength, caveat}).
- After: active non-archived hypotheses with empty
evidence_for = 0; total with evidence populated = 1192.