LLM agents (Theorist, Synthesizer, Domain Expert) produce hypothesis
text and debate-round content that frequently invents PMIDs, gene-name
typos, or causal links not actually present in any indexed paper. The
existing citation_validity sweep catches the first failure mode but
not the third (claims with no specific citation at all). Build a
detector that, for each LLM-generated paragraph, extracts atomic
sub-claims, runs each through a retrieval-grounded baseline (top-3 hits
from PubMed + Semantic Scholar), and computes a hallucination_score
∈ [0, 1] = fraction of sub-claims with zero retrieval support.
Effort: thorough
scidex/senate/hallucination_detector.py::detect(text, source_artifact_id, source_artifact_type) -> HallucinationReport with fields {score, sub_claims: list[{claim, supported, top_hits}], flagged_spans: list[dict], detected_at}.wiki_claim_extractor's LLM prompt (atomic factual statements, ≤ 5 per paragraph); fall back to sentence-level chunking if extraction fails.tools.pubmed_search(claim_text, retmax=3) and tools.semantic_scholar_search(claim_text, limit=3); "supported" iff ≥ 1 hit's title+abstract entails the claim per LLM judge (reuse auto_fact_check's judge if q-qual-auto-fact-check-pipeline is available, otherwise a lighter entails(claim, abstract) prompt).migrations/20260428_hallucination_reports.sql: hallucination_reports(id, source_artifact_id, source_artifact_type, score REAL, sub_claims JSONB, flagged_spans JSONB, prompt_version TEXT, detected_at TIMESTAMPTZ); index on (source_artifact_type, score DESC) for "most-hallucinating-recent".run_sweep(artifact_types=['hypothesis','debate_round','analysis'], window_hours=24) -> dict runs hourly via economics_drivers/ci_hallucination_sweep.py; only sweeps artifacts with content modified in the window.score > 0.5, fire hallucination_alert Senate proposal AND set hypotheses.flagged_hallucination = TRUE (new bool column); the hypothesis composite score docks 15 % until cleared by a human-reviewed proposal vote./senate/hallucination-leaderboard page lists top-50 most-hallucinating recent artifacts with sub-claim drill-down.agent_hallucination_rate(agent_id, window_days=30) -> float computed from hallucination_reports joined to agent_skill_invocations/debate_messages.author. Feeds into agent_calibration reputation update.tests/test_hallucination_detector.py: synthetic paragraph with 3 real claims + 1 invented gene "FAKEGENE5X" → ≥ 1 sub-claim flagged; all-real-claim paragraph → score < 0.2; empty text returns 0; PubMed-down fallback uses S2-only.retrieval_baseline(claim) -> list[{title, abstract, score}] via concurrent calls to PubMed + S2 with a 10 s budget.scidex/senate/governance.py::create_proposal() is the existing entry point).templates/senate/hallucination_leaderboard.html.tools.pubmed_search, tools.semantic_scholar_search.scidex/atlas/wiki_claim_extractor.py (prompt template).scidex/senate/calibration.py (reputation update path).q-mem-error-recovery-memory — uses hallucination patterns as recoverable errors.q-qual-claim-consistency-engine — relies on hallucination flags.scidex/senate/hallucination_detector.py: detect() → HallucinationReport withmigrations/20260428_hallucination_reports.sql: hallucination_reports tableeconomics_drivers/ci_hallucination_sweep.py: hourly sweep driver with 55-min/senate/hallucination-leaderboard page (170-line HTML) to api.py listing top-50tests/test_hallucination_detector.py: 12 tests (all passing) covering empty text,Implementation verified on main at commit 7eab2d32d:
scidex/senace/hallucination_detector.py::detect() at line 275 ✓tests/test_hallucination_detector.py exists (269 lines) ✓economics_drivers/ci_hallucination_sweep.py exists (107 lines) ✓migrations/20260428_hallucination_reports.sql exists (91 lines) ✓82ba17d62 ✓Issue: Commit 0fee7e12b (GitHub bidirectional sync, #747) corrupted api.py by
replacing ~88K lines of Python code with a git merge message (~8 lines). This made
api.py unimportable and blocked the hallucination detector from running.
Fix: Restored api.py from the good commit 7eab2d32d (which contains the
hallucination leaderboard page and all other code). Also added the missing
from contextvars import ContextVar import needed at line ~1475 (module-level
use of ContextVar without proper import).
Verification:
api.py imports successfully ✓/senate/hallucination-leaderboard route registered ✓3531b35e3 pushed to orchestra/task/58055617-hallucination-detector-compare-llm-claimImplementation fully verified on main at commit ee8de5729 (PR #751). Worktree at main HEAD 8b4e2d3fb with zero diff.
Verification evidence:
detect() signature matches spec: (text, source_artifact_id, source_artifact_type) -> HallucinationReport ✓HallucinationReport fields: source_artifact_id, source_artifact_type, score, sub_claims, flagged_spans, detected_at, prompt_version ✓tests/test_hallucination_detector.py pass ✓migrations/20260428_hallucination_reports.sql exists (91 lines) ✓economics_drivers/ci_hallucination_sweep.py exists (107 lines) ✓/senate/hallucination-leaderboard route registered in api.py ✓{
"completion_shas": [
"ee8de5729"
],
"completion_shas_checked_at": ""
}