[Senate] Orphan coverage check

← All Specs

[Senate] Orphan coverage check

Task

  • ID: e1cf8f9a-6a64-4c25-8264-f103e5eb62db
  • Type: recurring
  • Frequency: every-12h
  • Layer: Senate

Goal

Stop research output from getting marooned. Analyses with no HTML report,
hypotheses with no linked HTML page, and rows with missing report_url are
invisible to users and to downstream pipelines. This driver scans for those
orphans and either fixes them automatically (when the artifact clearly
exists on disk) or surfaces them as follow-up tasks.

What it does

  • Scans for three classes of orphan:
1. analyses rows with empty report_url / html_path but a notebook or
JSON result on disk.
2. hypotheses rows whose HTML detail page fails to render (no cached
content, no wiki mapping, no analysis linkage).
3. Any row in core content tables where an expected *_url column is
null but the file exists in the conventional location.
  • For each detected orphan:
- Auto-fix when the mapping is unambiguous: compute the expected URL,
verify the file is readable (HEAD 200), UPDATE the row, log the
before/after.
- Surface as a one-shot Senate task tagged orphan-coverage when the
mapping is ambiguous (multiple candidate artifacts, missing source).
  • Emits agent_contributions (type=orphan_repair) per auto-fix.
  • Writes a summary JSON to logs/orphan-coverage-latest.json.

Success criteria

  • Every 12h cycle: full scan completes and produces a summary artifact.
  • Orphan counts per class trend downward week-over-week.
  • Auto-fixes never point at missing files (verified via HEAD before UPDATE).
  • Run log: rows scanned per class, auto-fixes applied, tasks surfaced,
retries.

Quality requirements

  • No stubs: each auto-fix must be verified by a HEAD request or filesystem
stat — never blindly UPDATE a URL column — link to meta-quest
quest_quality_standards_spec.md.
  • When operating on ≥10 orphans across the three classes, use 3–5 parallel
agents to distribute verification.
  • Log total items processed + retries so we can detect busywork (same row
re-repaired cycle after cycle → underlying template or upload pipeline
is broken, surface a Senate task instead).
  • INFERRED: exact column names vary by table; driver reads the current
schema at runtime rather than hard-coding column lists.

Work Log

2026-04-21 23:55 PT — Slot 0

  • Found prior implementation attempt: commit 44fc65040 with S4 integrity_sweeper.py, senate.py route, and migration 108
  • Reviewer blocked push: commit touches api.py but message didn't mention it
  • Fixed: amended commit message to explicitly name api.py change
  • Ran dry-run test: sweeper finds dangling paths (alignment_reports.report_path, artifact_nominations.artifact_path) correctly
  • API status confirmed healthy (396 analyses, 984 hypotheses, 711K edges)
  • Force-pushed amended commit to origin
  • Result: committed and pushed — supervisor will verify merge

File: e1cf8f9a-6a64-4c25-8264-f103e5eb62db_orphan_coverage_check_spec.md
Modified: 2026-04-25 23:40
Size: 3.1 KB