[Senate] Rebuild theme S4: orphan / duplicate / broken-link sweeper as an LLM-judged continuous process

Rebuild spec — follow docs/planning/specs/rebuild_theme_template_spec.md first.

Theme anchor

Theme: S4 — Orphan / duplicate / broken-link detection +

auto-repair

Layer: Senate
Full description: docs/design/retired_scripts_patterns.md → S4

Why this matters now

Two deleted recurring-task implementations (orphan_coverage_check.py, convergence_monitor.py, cleanup_stale_figure_artifacts.py) plus
the open recurring task [Senate] Orphan coverage check (every-12h, e1cf8f9a-6a64-4c25-8264-f103e5eb62db) and [Senate] Convergence monitor (1c1ea5d4-d2c9-4def-82bb-87174192bc31). Both tasks currently
have no implementation to call.

Template fills

{{THEME_ID}} = S4
{{THEME_NAME}} = orphan / duplicate / broken-link sweeper
{{LAYER}} = Senate
{{LAYER_SLUG}} = senate
{{THEME_SLUG}} = integrity_sweeper
{{CADENCE}} = every 6h
{{CORE_JUDGMENT}} = "is this row broken (orphan / duplicate /

dangling reference), and if yes, what is the correct repair?"

{{GAP_PREDICATE}} = output of a registry of integrity checks,

each returning (entity_id, violation_type) rows.

Where LLMs are load-bearing

Most integrity work is rule-based (FK check, file-exists check), but duplicate-merging is fundamentally semantic. "Are these two
wiki pages for the same gene?" can't be answered by string equality —
"TREM2", "trem2", "Trem2 receptor", "TREM-2" are all the same
canonical entity.

Rule for this theme: FK / file-exists / reference-resolution checks
are rules. Entity equivalence / merge proposals / repair-path
disambiguation are LLM rubrics.

High-confidence LLM verdicts (similarity ≥ 0.95) execute
auto-merge. Lower-confidence create a review ticket (a new orchestra
task). The operator-gate threshold is in theme_config.

Discover integrity checks at runtime

The registry of integrity checks is self-describing:

Each FK in pg_constraint where the referencing row can be missing

→ a rule.

Each path-typed column (name matches _path, _url, *_file)

→ a file-exists rule.

Additional registered checks in integrity_check_registry table

(one row per check: query returning bad rows, repair function name).

No code change needed to add new FKs or path columns; the checker
finds them.

Outcome feedback

Per-check: how many violations found this run vs last N runs. If

steadily increasing, the check's upstream is regressing → alert.

Auto-merge outcomes: did operators revert any auto-merges within

7d? If yes, the confidence threshold is too aggressive → auto-tune
upward.

Repair failure rate: if an LLM-suggested repair fails, downgrade

the rubric's confidence for that violation type.

Acceptance

All template criteria, plus:

☐ Integrity-check registry is discoverable (pg_constraint + path

columns + explicit registry), not a hardcoded list.

☐ Entity-merge confidence threshold in theme_config, adjustable.

☐ Two failing-verdict LLM calls max per entity; after that, skip

and log (graceful degradation).

☐ Recurring tasks e1cf8f9a- and 1c1ea5d4- reassigned to

this process.

☐ Auto-tune threshold from operator-revert rate demonstrated in

testing.

File: rebuild_theme_S4_orphan_duplicate_sweeper_spec.md

Modified: 2026-04-24 07:15

Size: 3.3 KB