[Agora] Add PubMed evidence to 20 hypotheses lacking citations done

← Agora
Hypotheses with empty evidence_for field have no grounding in published research, weakening their credibility and debate quality. For each of 20 target hypotheses: (1) extract the target_gene and domain from the hypothesis; (2) search PubMed for relevant papers (use pubmed_search tool or paper_cache); (3) select 3–5 most relevant PMIDs with supporting evidence excerpts; (4) UPDATE hypotheses SET evidence_for= WHERE id=. Prioritize hypotheses with highest composite_score. Verification: 20 hypotheses must have non-empty evidence_for after task.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (3)

Squash merge: orchestra/task/e967d229-add-pubmed-evidence-to-20-hypotheses-lac (1 commits)2026-04-22
Squash merge: orchestra/task/e967d229-add-pubmed-evidence-to-20-hypotheses-lac (1 commits)2026-04-22
[Verify] [Agora] Add PubMed evidence to 20 hypotheses — already resolved [task:e967d229-491b-4afb-bb80-e35e65d57812]2026-04-22
Spec File

Goal

Attach real PubMed-backed evidence to hypotheses whose evidence_for field is empty. This improves scientific grounding and prevents debate, ranking, and market workflows from relying on unsupported claims.

Acceptance Criteria

☐ A concrete batch of hypotheses gains non-empty evidence_for entries
☐ Each evidence entry includes PMID, DOI, or equivalent citation provenance
☐ No hollow placeholder evidence is inserted
☐ Before/after counts are recorded in the task work log

Approach

  • Select hypotheses with empty evidence_for, prioritizing active and high-impact rows.
  • Use paper_cache.search_papers or paper_cache.get_paper to find relevant PubMed evidence.
  • Add concise supporting evidence with citation identifiers and caveats.
  • Verify the updated evidence fields and remaining backlog count.
  • Dependencies

    • c488a683-47f - Agora quest
    • paper_cache PubMed lookup helpers

    Dependents

    • Hypothesis debates, evidence validators, and Exchange confidence scoring

    Work Log

    2026-04-20 - Quest engine template

    • Created reusable spec for quest-engine generated hypothesis evidence tasks.

    2026-04-21 12:28:03Z - Watchdog repair b209ba9b

    • Investigated abandoned task 030034d6-752e-4ac9-9935-36489c7ec792.
    • Found 43 hypotheses with empty evidence_for, but all 43 are archived placeholder rows titled [Archived Hypothesis]; there are 0 active/non-placeholder hypotheses requiring PubMed evidence.
    • Dry-ran scripts/add_pubmed_evidence.py --dry-run --limit 5 and confirmed it would query the literal placeholder title and attach the same unrelated PMIDs to archived rows.
    • Plan: exclude archived placeholders from quest-engine backlog counts and PubMed backfill selection, store structured evidence objects, and restore the missing tested backfill helper module.

    2026-04-21 12:32:30Z - Watchdog repair result

    • Updated quest-engine evidence backlog detection to ignore archived placeholder hypotheses.
    • Hardened scripts/add_pubmed_evidence.py so dry runs and live runs ignore archived placeholders and write structured PubMed evidence objects instead of bare PMID arrays.
    • Restored scripts/backfill_evidence_pubmed.py for the existing unit tests and PostgreSQL-aware backfill helper behavior.
    • Verified: python3 scripts/add_pubmed_evidence.py --dry-run --limit 5 reports 0 actionable hypotheses and 43 archived placeholders ignored.
    • Verified: quest_engine.discover_gaps(get_db()) no longer emits the hypothesis-pubmed-evidence gap for the archived placeholder backlog.
    • Tested: pytest -q tests/test_backfill_evidence_pubmed.py -> 18 passed; python3 -m py_compile scripts/add_pubmed_evidence.py scripts/backfill_evidence_pubmed.py quest_engine.py passed.

    2026-04-22 13:21:27Z - Verification e967d229

    • Verified: python3 scripts/add_pubmed_evidence.py --limit 5 shows 0 actionable hypotheses needing evidence.
    • All 43 empty-evidence rows are [Archived Hypothesis] placeholders (archived status) — not valid enrichment targets.
    • Ran live update on h-a2b3485737 (CAPN1/CAPN2, score=0.4199, status=proposed): 5 PubMed PMIDs attached with structured evidence ({pmid, doi, claim, source, year, url, strength, caveat}).
    • Non-placeholder, non-archived hypotheses with empty evidence_for: 0 (target met).
    • Non-placeholder hypotheses with evidence_for populated: 874 (confirmed via SELECT COUNT(*)).
    • Conclusion: task acceptance criteria already met; no additional enrichment needed.

    Already Resolved — 2026-04-23 05:00:00Z

    • Evidence: commit 5eb210854 merged to main; top 25 hypotheses by composite_score all have non-empty evidence_for (verified via DB query). Sample PMIDs verified via paper_cache.get_paper(): 41491101, 41530860, 41714746, 41804841 — all return real papers. All 43 empty-evidence rows are [Archived Hypothesis] placeholders.
    • Task ID: d02ec580-83c8-4bc0-8495-17a069138c6a
    • Acceptance criteria: satisfied — 1123/1166 hypotheses have evidence; the 43 without are archived placeholders correctly excluded.

    2026-04-25 19:01:28Z - Live enrichment task:316fc918-80c1-4cff-a85f-72b226d0d18a

    • Before: 7 active non-archived hypotheses had empty evidence_for; 43 archived placeholders ignored.
    • Enriched all 7 with PubMed evidence via scripts/add_pubmed_evidence.py --limit 20.
    • Hypotheses updated: h-e7e1f943 (NLRP3/CASP1), h-ee1df336 (GLP1R/BDNF), h-6c83282d (CLDN1/OCLN), h-74777459 (SNCA/HSPA1A), h-2e7eb2ea (TLR4/SNCA), h-f9c6fa3f (AHR/IL10), h-7bb47d7a (TH/AADC).
    • Each received 5 structured PubMed entries ({pmid, doi, claim, source, year, url, strength, caveat}).
    • After: active non-archived hypotheses with empty evidence_for = 0; with evidence populated = 1144.

    2026-04-25 19:07:25Z - Thin-evidence enrichment task:316fc918-80c1-4cff-a85f-72b226d0d18a

    • Extended scripts/add_pubmed_evidence.py with --thin-evidence N flag to also enrich hypotheses with fewer than N evidence entries (merges without overwriting existing entries).
    • Enriched 13 more hypotheses with 1-2 existing evidence entries, bringing total to 20 (task target).
    • Hypotheses updated: h-d4ac0303f6 (LRRK2/G2019S), h-fc43140722 (LRRK2/RAB10/microglia), h-26353f7f59 (APOE/SORL1), h-immunity-50f8d4f4 (CD8+/PRF1), h-ae63a5a13c (BRD4/BET), h-dd0fe43949 (LRRK2-Rab10-JIP4), h-aging-hippo-cortex-divergence (CDKN2A), h-immunity-c3bc272f (C1q/TREM2), h-d28c25f278 (ALDH1A1/DOPAL), h-85b51a8f58 (TREM2/TYROBP), h-c9b96e0e3b (RAB12), h-immunity-03dc171e (SPP1+ DAM), h-e8b3b9f971 (BBB/MMP).
    • All 13 received 3-5 new structured PubMed entries merged with existing citations.
    • Final: 120 hypotheses remain with < 3 evidence entries (no-zero-evidence non-archived hypotheses remain).

    2026-04-25 20:05:00Z - Epigenetics hypothesis enrichment task:316fc918-80c1-4cff-a85f-72b226d0d18a

    • Before: 7 active non-archived hypotheses had empty evidence_for (all epigenetics-related).
    • Ran scripts/add_pubmed_evidence.py --limit 20 — 6/7 enriched automatically.
    • For h-1df5ba79 (Bivalent Domain Resolution Failure at Neurodevelopment Genes), auto-search returned no results; manually added 5 targeted PMIDs (28793256, 25250711, 23379639, 31564637, 35640156) covering bivalent H3K4me3/H3K27me3 chromatin dynamics.
    • Hypotheses updated: h-3a8f13ac (SEP), h-cee6b095 (REST Complex), h-96795760 (H3K9me3), h-5c9b3fe9 (Polycomb-Trithorax), h-00073ccb (DNA Methylation Clock), h-1df5ba79 (Bivalent Domain), h-8eb6be5e (Mitochondrial-Nuclear Epigenetic).
    • Each received 5 structured PubMed entries ({pmid, doi, claim, source, year, url, strength, caveat}).
    • After: active non-archived hypotheses with empty evidence_for = 0; total with evidence populated = 1192.

    Sibling Tasks in Quest (Agora) ↗