Today scidex/atlas/citation_validity.py checks whether the cited PMID
supports the claim. It does NOT check whether the claim is true —
i.e. whether independent retrieval (PubMed, Semantic Scholar, OpenAlex)
returns supporting evidence beyond what the agent originally cited.
Build a pipeline that, for every atomic claim extracted by
scidex/atlas/wiki_claim_extractor.py (and equivalents on hypotheses,
debate transcripts, analyses), fans out to **3 independent retrieval
sources** (pubmed_search, semantic_scholar_search, openalex_works
in tools.py), asks an LLM "do these results support, contradict, or
fail to address the claim?" per source, and records a triangulated
verdict. Claims with ≥2 contradicting sources or ≥2 off-topic sources
are flagged for resolution.
Effort: thorough
scidex/senate/auto_fact_check.py::fact_check(claim_text, claim_id, claim_source) -> FactCheckResult with fields {verdict, source_verdicts: dict[source, verdict], confidence, evidence_quotes: list[dict], checked_at}. Verdict ∈ {supported, weakly_supported, mixed, contradicted, unsupported}.tools.pubmed_search, tools.semantic_scholar_search, tools.openalex_works.migrations/20260428_claim_fact_checks.sql: claim_fact_checks(id, claim_id, claim_source ENUM('hypothesis','wiki','debate','analysis'), verdict, source_verdicts JSONB, confidence REAL, evidence_quotes JSONB, checked_at TIMESTAMPTZ); UNIQUE(claim_id, claim_source); audit-trigger mirror table claim_fact_checks_history.run_sweep(batch_size=50, claim_source=None) -> dict driver; idempotent re-runs only re-check claims older than 90 days OR when prompt-version hash changed.verdict = mode(source_verdicts.values()); if mode count = 1 (all 3 disagree), verdict='mixed'; confidence = mode_count / 3.claim_text is < 30 chars; skip definitional claims (heuristic match against templates from wiki_claim_extractor's exclude list); rate-limited via q-ri-quota-aware-throttle (defer if not landed).GET /api/senate/fact_check/{claim_id} returns the triangulated verdict + 3 source breakdowns; POST /api/senate/fact_check/run (admin) kicks off a sweep./hypothesis/{id}, /wiki/{slug}, and analysis pages with a 3-segment dot (one per source) — green/yellow/red per source — so a user can glance and see "PubMed: green, S2: yellow, OpenAlex: contradicts".tests/test_auto_fact_check.py: synthetic claim with 3 supporting sources → supported; one source contradicts, two support → weakly_supported; two contradict → contradicted; offline sources tolerated (verdict still computed from remaining sources, confidence dropped).llm.complete() with a verdict prompt that returns JSON {verdict, evidence_quote, contradiction_quote}. Reuse the prompt-version hash strategy from comment_classifier.py.db_writes.insert_claim_fact_check(); emit a claim_fact_check_completed event so downstream consumers (epistemic_health, hypothesis composite scoring) can react.% claims fact-checked and % supported on /senate/quality-dashboard.supported → contradicted, fire a Senate claim_contradiction_alert proposal so a human/agent can adjudicate.tools.py — PubMed, Semantic Scholar, OpenAlex wrappers.scidex/atlas/wiki_claim_extractor.py — supplies atomic claims.scidex/atlas/citation_validity.py — complementary (citation→claim alignment); this task is claim→world alignment.q-qual-claim-consistency-engine — needs verdicts to flag contradictions.q-qual-hallucination-detector — uses fact-check results as ground truth.{
"completion_shas": [
"74668743c"
],
"completion_shas_checked_at": "2026-04-27T15:35:07.336831+00:00"
}