Build a pipeline that extracts 3-5 specific mechanistic claims from each of the top 50 hypotheses by composite score, searches PubMed for supporting and contradicting evidence, and updates evidence_validation_score + evidence_for/evidence_against JSON arrays with verified citations.
WHY: Hypothesis scores are currently based on debate outcomes, not systematic PubMed grounding. Claims like "TDP-43 nuclear depletion leads to STMN2 mis-splicing" may or may not have direct PubMed support. A claim verifier creates the quality layer that makes scores meaningful — and flags hypotheses needing experimental verification.
WHAT TO DO:
1. Query top 50 hypotheses ordered by composite_score DESC.
2. For each hypothesis, extract 3-5 specific mechanistic claims using llm.py.
3. For each claim: search PubMed via paper_cache.search_papers() (last 10 years, up to 5 papers).
4. Classify each paper as: 'supports', 'contradicts', 'partial', or 'unrelated' using LLM.
5. Update hypothesis: append verified citations to evidence_for (supports) and evidence_against (contradicts). Update evidence_validation_score = (claims with ≥1 support) / total claims. Update citations_count.
6. Skip hypotheses where evidence_validation_score != 0.5 (already processed).
7. Log summary: hypothesis_id, claims checked, support rate.
READ FIRST: docs/planning/specs/agora_mechanistic_claim_verifier_spec.md (merged after 2026-04-28)
DO NOT: Overwrite existing evidence — append only. Add unrelated papers. Modify composite_score, novelty_score, or any score except evidence_validation_score. Write to scidex.db.
SUCCESS: 50 hypotheses processed (or iteration limit), each with ≥3 claims evaluated, evidence arrays updated with verified PMIDs, evidence_validation_score updated.