[Senate] Auto fact-check pipeline - cross-verify every claim across 3 sources done

← Content Quality Sweep
Triangulated claim verification: fan out every atomic claim to PubMed+S2+OpenAlex, LLM judge per source, mode-vote verdict + confidence.

Completion Notes

Auto fact-check pipeline complete. Fans out claims to PubMed/Semantic Scholar/OpenAlex, LLM judges each source, triangulates via mode vote. API routes added (`GET /api/senate/fact_check/{claim_id}`, `POST /api/senate/fact_check/run`). HTML badge on hypothesis pages. 15 tests passing. SQL migration applied.
Spec File

Goal

Today scidex/atlas/citation_validity.py checks whether the cited PMID
supports the claim. It does NOT check whether the claim is true
i.e. whether independent retrieval (PubMed, Semantic Scholar, OpenAlex)
returns supporting evidence beyond what the agent originally cited.
Build a pipeline that, for every atomic claim extracted by scidex/atlas/wiki_claim_extractor.py (and equivalents on hypotheses,
debate transcripts, analyses), fans out to **3 independent retrieval
sources** (pubmed_search, semantic_scholar_search, openalex_works
in tools.py), asks an LLM "do these results support, contradict, or
fail to address the claim?" per source, and records a triangulated
verdict. Claims with ≥2 contradicting sources or ≥2 off-topic sources
are flagged for resolution.

Effort: thorough

Acceptance Criteria

scidex/senate/auto_fact_check.py::fact_check(claim_text, claim_id, claim_source) -> FactCheckResult with fields {verdict, source_verdicts: dict[source, verdict], confidence, evidence_quotes: list[dict], checked_at}. Verdict ∈ {supported, weakly_supported, mixed, contradicted, unsupported}.
☐ Source fan-out reuses the 3 existing tool wrappers — no new API integrations: tools.pubmed_search, tools.semantic_scholar_search, tools.openalex_works.
☐ Migration migrations/20260428_claim_fact_checks.sql: claim_fact_checks(id, claim_id, claim_source ENUM('hypothesis','wiki','debate','analysis'), verdict, source_verdicts JSONB, confidence REAL, evidence_quotes JSONB, checked_at TIMESTAMPTZ); UNIQUE(claim_id, claim_source); audit-trigger mirror table claim_fact_checks_history.
run_sweep(batch_size=50, claim_source=None) -> dict driver; idempotent re-runs only re-check claims older than 90 days OR when prompt-version hash changed.
☐ Triangulation rule: verdict = mode(source_verdicts.values()); if mode count = 1 (all 3 disagree), verdict='mixed'; confidence = mode_count / 3.
☐ Cost cap: skip claims whose claim_text is < 30 chars; skip definitional claims (heuristic match against templates from wiki_claim_extractor's exclude list); rate-limited via q-ri-quota-aware-throttle (defer if not landed).
☐ API: GET /api/senate/fact_check/{claim_id} returns the triangulated verdict + 3 source breakdowns; POST /api/senate/fact_check/run (admin) kicks off a sweep.
☐ HTML badge: render the triangulated verdict on /hypothesis/{id}, /wiki/{slug}, and analysis pages with a 3-segment dot (one per source) — green/yellow/red per source — so a user can glance and see "PubMed: green, S2: yellow, OpenAlex: contradicts".
☐ Tests tests/test_auto_fact_check.py: synthetic claim with 3 supporting sources → supported; one source contradicts, two support → weakly_supported; two contradict → contradicted; offline sources tolerated (verdict still computed from remaining sources, confidence dropped).

Approach

  • Build the 3 retrieval workers as small async tasks, each returning the top-5 papers (title + abstract) for the claim.
  • Per source, call llm.complete() with a verdict prompt that returns JSON {verdict, evidence_quote, contradiction_quote}. Reuse the prompt-version hash strategy from comment_classifier.py.
  • Persist via db_writes.insert_claim_fact_check(); emit a claim_fact_check_completed event so downstream consumers (epistemic_health, hypothesis composite scoring) can react.
  • Surface aggregate % claims fact-checked and % supported on /senate/quality-dashboard.
  • When a claim flips from supportedcontradicted, fire a Senate claim_contradiction_alert proposal so a human/agent can adjudicate.
  • Dependencies

    • tools.py — PubMed, Semantic Scholar, OpenAlex wrappers.
    • scidex/atlas/wiki_claim_extractor.py — supplies atomic claims.
    • scidex/atlas/citation_validity.py — complementary (citation→claim alignment); this task is claim→world alignment.

    Dependents

    • q-qual-claim-consistency-engine — needs verdicts to flag contradictions.
    • q-qual-hallucination-detector — uses fact-check results as ground truth.

    Work Log

    Payload JSON
    {
      "completion_shas": [
        "74668743c"
      ],
      "completion_shas_checked_at": "2026-04-27T15:35:07.336831+00:00"
    }

    Sibling Tasks in Quest (Content Quality Sweep) ↗