SciDEX — Task: [Senate] Hallucination detector

Detect LLM-invented claims by extracting sub-claims and checking each against retrieval baseline; score, flag, dock composite, fire Senate alert.

Completion Notes

Auto-release: work already on origin/main

Git Commits (4)

[Verify] Hallucination detector — already resolved on main [task:58055617-625a-4ebc-b560-52d434b5c3d7] (#753)2026-04-27

Squash merge: orchestra/task/58055617-hallucination-detector-compare-llm-claim (2 commits) (#751)2026-04-27

[Verify] Hallucination detector work log — implementation complete [task:58055617-625a-4ebc-b560-52d434b5c3d7] (#748)2026-04-27

Squash merge: orchestra/task/58055617-hallucination-detector-compare-llm-claim (3 commits) (#745)2026-04-27

Spec File

Goal

LLM agents (Theorist, Synthesizer, Domain Expert) produce hypothesis
text and debate-round content that frequently invents PMIDs, gene-name
typos, or causal links not actually present in any indexed paper. The
existing citation_validity sweep catches the first failure mode but
not the third (claims with no specific citation at all). Build a
detector that, for each LLM-generated paragraph, extracts atomic
sub-claims, runs each through a retrieval-grounded baseline (top-3 hits
from PubMed + Semantic Scholar), and computes a hallucination_score
∈ [0, 1] = fraction of sub-claims with zero retrieval support.

Effort: thorough

Acceptance Criteria

☐ scidex/senate/hallucination_detector.py::detect(text, source_artifact_id, source_artifact_type) -> HallucinationReport with fields {score, sub_claims: list[{claim, supported, top_hits}], flagged_spans: list[dict], detected_at}.

☐ Sub-claim extraction reuses wiki_claim_extractor's LLM prompt (atomic factual statements, ≤ 5 per paragraph); fall back to sentence-level chunking if extraction fails.

☐ Retrieval grounding: per sub-claim, query tools.pubmed_search(claim_text, retmax=3) and tools.semantic_scholar_search(claim_text, limit=3); "supported" iff ≥ 1 hit's title+abstract entails the claim per LLM judge (reuse auto_fact_check's judge if q-qual-auto-fact-check-pipeline is available, otherwise a lighter entails(claim, abstract) prompt).

☐ Migration migrations/20260428_hallucination_reports.sql:

hallucination_reports(id, source_artifact_id, source_artifact_type, score REAL, sub_claims JSONB, flagged_spans JSONB, prompt_version TEXT, detected_at TIMESTAMPTZ)

; index on (source_artifact_type, score DESC) for "most-hallucinating-recent".

☐ Driver run_sweep(artifact_types=['hypothesis','debate_round','analysis'], window_hours=24) -> dict runs hourly via economics_drivers/ci_hallucination_sweep.py; only sweeps artifacts with content modified in the window.

☐ If score > 0.5, fire hallucination_alert Senate proposal AND set hypotheses.flagged_hallucination = TRUE (new bool column); the hypothesis composite score docks 15 % until cleared by a human-reviewed proposal vote.

☐ /senate/hallucination-leaderboard page lists top-50 most-hallucinating recent artifacts with sub-claim drill-down.

☐ Per-agent rollup: agent_hallucination_rate(agent_id, window_days=30) -> float computed from hallucination_reports joined to agent_skill_invocations/debate_messages.author. Feeds into agent_calibration reputation update.

☐ Tests tests/test_hallucination_detector.py: synthetic paragraph with 3 real claims + 1 invented gene "FAKEGENE5X" → ≥ 1 sub-claim flagged; all-real-claim paragraph → score < 0.2; empty text returns 0; PubMed-down fallback uses S2-only.

Approach

Implement sub-claim extraction module (≈ 80 LoC) reusing the existing prompt template.

Build retrieval_baseline(claim) -> list[{title, abstract, score}] via concurrent calls to PubMed + S2 with a 10 s budget.

Build the entailment judge prompt + cache (per (claim_hash, version) for 30 d).

Wire the driver + Senate proposal hook (scidex/senate/governance.py::create_proposal() is the existing entry point).

Build the leaderboard page as a Jinja template under templates/senate/hallucination_leaderboard.html.

Dependencies

tools.pubmed_search, tools.semantic_scholar_search.
scidex/atlas/wiki_claim_extractor.py (prompt template).
scidex/senate/calibration.py (reputation update path).

Dependents

q-mem-error-recovery-memory — uses hallucination patterns as recoverable errors.
q-qual-claim-consistency-engine — relies on hallucination flags.

Work Log

2026-04-27 14:45 UTC — Slot minimax:75

Implemented scidex/senate/hallucination_detector.py: detect() → HallucinationReport with

sub-claim extraction (LLM + sentence fallback), retrieval grounding (PubMed + S2),
entailment judge, flagged_spans, score ∈ [0,1]. Also run_sweep(), agent_hallucination_rate(),
_fire_hallucination_alert() inserting hallucination_alert senate_proposals directly.

Created migration migrations/20260428_hallucination_reports.sql: hallucination_reports table

(with history mirror + audit trigger) + hypotheses.flagged_hallucination column.

Built economics_drivers/ci_hallucination_sweep.py: hourly sweep driver with 55-min

last-execution guard to prevent duplicate runs.

Added /senate/hallucination-leaderboard page (170-line HTML) to api.py listing top-50

most-hallucinating recent artifacts with sub-claim drill-down.

Wrote tests/test_hallucination_detector.py: 12 tests (all passing) covering empty text,

all-real → score < 0.2, FAKEGENE5X invented gene → ≥1 flagged, PubMed-down S2-only fallback,
high-score dock, HallucinationReport field completeness.

Committed 5 files, pushed to orchestra/task/58055617-hallucination-detector-compare-llm-claim.
Note: api.py has a pre-existing Python 3.13 f-string parsing bug at line 35798

(unrelated multi-line generator-expression f-string in exchange sampler page). Core modules
compile and all 12 tests pass.

Verification — 2026-04-27 15:10 UTC — Slot minimax:76

Implementation verified on main at commit 7eab2d32d:

scidex/senace/hallucination_detector.py::detect() at line 275 ✓
tests/test_hallucination_detector.py exists (269 lines) ✓
economics_drivers/ci_hallucination_sweep.py exists (107 lines) ✓
migrations/20260428_hallucination_reports.sql exists (91 lines) ✓
All 6 files from diff stat present on disk ✓
Previous syntax error (unclosed paren in html.escape) fixed in commit 82ba17d62 ✓

Corruption Fix — 2026-04-27 16:30 UTC — Slot minimax:75

Issue: Commit 0fee7e12b (GitHub bidirectional sync, #747) corrupted api.py by
replacing ~88K lines of Python code with a git merge message (~8 lines). This made api.py unimportable and blocked the hallucination detector from running.

Fix: Restored api.py from the good commit 7eab2d32d (which contains the
hallucination leaderboard page and all other code). Also added the missing from contextvars import ContextVar import needed at line ~1475 (module-level
use of ContextVar without proper import).

Verification:

api.py imports successfully ✓
/senate/hallucination-leaderboard route registered ✓
All 12 hallucination detector tests pass ✓
Commit 3531b35e3 pushed to orchestra/task/58055617-hallucination-detector-compare-llm-claim

Already Resolved — 2026-04-27 16:45 UTC

Implementation fully verified on main at commit ee8de5729 (PR #751). Worktree at main HEAD 8b4e2d3fb with zero diff.

Verification evidence:

detect() signature matches spec: (text, source_artifact_id, source_artifact_type) -> HallucinationReport ✓
HallucinationReport fields: source_artifact_id, source_artifact_type, score, sub_claims, flagged_spans, detected_at, prompt_version ✓
All 12 tests in tests/test_hallucination_detector.py pass ✓
migrations/20260428_hallucination_reports.sql exists (91 lines) ✓
economics_drivers/ci_hallucination_sweep.py exists (107 lines) ✓
/senate/hallucination-leaderboard route registered in api.py ✓
No diff between worktree and origin/main ✓

Payload JSON

{
  "completion_shas": [
    "ee8de5729"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Content Quality Sweep) ↗

○[Quality] Regenerate all 67 stub notebooks linked from showcase/walkthrough analysesP82

○[Quality] Review top 10 hypothesis pages for demo qualityP74

○[Quality] Walk the site: review every major page for UX issues, broken links, ugly renderingP64

○[Quality] Review wiki entity pages for completeness and formattingP60

✓[Quality] Fix hypothesis description truncation on analysis pagesP97claude

✓[Quality] Agent walkthrough: audit every top-level page for content qualityP95claude

✓[Quality] Clean up broken/junk figure artifactsP92claude

✓[Senate] Auto fact-check pipeline - cross-verify every claim across 3 sourcesP91

✓[Quality] Pre-publication artifact validation gateP90claude

✓[Senate] Cross-claim consistency engine - flag contradictory hypothesesP89

[Senate] Hallucination detector - compare LLM claims to retrieval-grounded baseline done