SciDEX — Task: [Senate] Auto fact-check pipeline

Triangulated claim verification: fan out every atomic claim to PubMed+S2+OpenAlex, LLM judge per source, mode-vote verdict + confidence.

Completion Notes

Auto fact-check pipeline complete. Fans out claims to PubMed/Semantic Scholar/OpenAlex, LLM judges each source, triangulates via mode vote. API routes added (`GET /api/senate/fact_check/{claim_id}`, `POST /api/senate/fact_check/run`). HTML badge on hypothesis pages. 15 tests passing. SQL migration applied.

Spec File

Goal

Today scidex/atlas/citation_validity.py checks whether the cited PMID
supports the claim. It does NOT check whether the claim is true —
i.e. whether independent retrieval (PubMed, Semantic Scholar, OpenAlex)
returns supporting evidence beyond what the agent originally cited.
Build a pipeline that, for every atomic claim extracted by scidex/atlas/wiki_claim_extractor.py (and equivalents on hypotheses,
debate transcripts, analyses), fans out to **3 independent retrieval
sources** (pubmed_search, semantic_scholar_search, openalex_works
in tools.py), asks an LLM "do these results support, contradict, or
fail to address the claim?" per source, and records a triangulated
verdict. Claims with ≥2 contradicting sources or ≥2 off-topic sources
are flagged for resolution.

Effort: thorough

Acceptance Criteria

☐ scidex/senate/auto_fact_check.py::fact_check(claim_text, claim_id, claim_source) -> FactCheckResult with fields {verdict, source_verdicts: dict[source, verdict], confidence, evidence_quotes: list[dict], checked_at}. Verdict ∈ {supported, weakly_supported, mixed, contradicted, unsupported}.

☐ Source fan-out reuses the 3 existing tool wrappers — no new API integrations: tools.pubmed_search, tools.semantic_scholar_search, tools.openalex_works.

☐ Migration migrations/20260428_claim_fact_checks.sql:

claim_fact_checks(id, claim_id, claim_source ENUM('hypothesis','wiki','debate','analysis'), verdict, source_verdicts JSONB, confidence REAL, evidence_quotes JSONB, checked_at TIMESTAMPTZ)

; UNIQUE(claim_id, claim_source); audit-trigger mirror table claim_fact_checks_history.

☐ run_sweep(batch_size=50, claim_source=None) -> dict driver; idempotent re-runs only re-check claims older than 90 days OR when prompt-version hash changed.

☐ Triangulation rule: verdict = mode(source_verdicts.values()); if mode count = 1 (all 3 disagree), verdict='mixed'; confidence = mode_count / 3.

☐ Cost cap: skip claims whose claim_text is < 30 chars; skip definitional claims (heuristic match against templates from wiki_claim_extractor's exclude list); rate-limited via q-ri-quota-aware-throttle (defer if not landed).

☐ API: GET /api/senate/fact_check/{claim_id} returns the triangulated verdict + 3 source breakdowns; POST /api/senate/fact_check/run (admin) kicks off a sweep.

☐ HTML badge: render the triangulated verdict on /hypothesis/{id}, /wiki/{slug}, and analysis pages with a 3-segment dot (one per source) — green/yellow/red per source — so a user can glance and see "PubMed: green, S2: yellow, OpenAlex: contradicts".

☐ Tests tests/test_auto_fact_check.py: synthetic claim with 3 supporting sources → supported; one source contradicts, two support → weakly_supported; two contradict → contradicted; offline sources tolerated (verdict still computed from remaining sources, confidence dropped).

Approach

Build the 3 retrieval workers as small async tasks, each returning the top-5 papers (title + abstract) for the claim.

Per source, call llm.complete() with a verdict prompt that returns JSON {verdict, evidence_quote, contradiction_quote}. Reuse the prompt-version hash strategy from comment_classifier.py.

Persist via db_writes.insert_claim_fact_check(); emit a claim_fact_check_completed event so downstream consumers (epistemic_health, hypothesis composite scoring) can react.

Surface aggregate % claims fact-checked and % supported on /senate/quality-dashboard.

When a claim flips from supported → contradicted, fire a Senate claim_contradiction_alert proposal so a human/agent can adjudicate.

Dependencies

tools.py — PubMed, Semantic Scholar, OpenAlex wrappers.
scidex/atlas/wiki_claim_extractor.py — supplies atomic claims.
scidex/atlas/citation_validity.py — complementary (citation→claim alignment); this task is claim→world alignment.

Dependents

q-qual-claim-consistency-engine — needs verdicts to flag contradictions.
q-qual-hallucination-detector — uses fact-check results as ground truth.

Work Log

Payload JSON

{
  "completion_shas": [
    "74668743c"
  ],
  "completion_shas_checked_at": "2026-04-27T15:35:07.336831+00:00"
}

Sibling Tasks in Quest (Content Quality Sweep) ↗

○[Quality] Regenerate all 67 stub notebooks linked from showcase/walkthrough analysesP82

○[Quality] Review top 10 hypothesis pages for demo qualityP74

○[Quality] Walk the site: review every major page for UX issues, broken links, ugly renderingP64

○[Quality] Review wiki entity pages for completeness and formattingP60

✓[Quality] Fix hypothesis description truncation on analysis pagesP97claude

✓[Quality] Agent walkthrough: audit every top-level page for content qualityP95claude

✓[Quality] Clean up broken/junk figure artifactsP92claude

✓[Quality] Pre-publication artifact validation gateP90claude

✓[Senate] Hallucination detector - compare LLM claims to retrieval-grounded baselineP90

✓[Senate] Cross-claim consistency engine - flag contradictory hypothesesP89

[Senate] Auto fact-check pipeline - cross-verify every claim across 3 sources done