[Senate] Rebuild theme S3: search-index coverage verification as a continuous process

← All Specs

[Senate] Rebuild theme S3: search-index coverage verification as a continuous process

Rebuild spec — follow docs/planning/specs/rebuild_theme_template_spec.md first.

Theme anchor

  • Theme: S3 — FTS / vector index coverage verification
  • Layer: Senate
  • Full description: docs/design/retired_scripts_patterns.md → S3

Why this matters now

The 3 deleted scripts under this theme (ci_verify_search_indexes.py, migrate_fts5.py, rebuild_fts.py) were SQLite-FTS-specific. SciDEX
is now PG. The recurring task [Search] CI: Verify search indexes are
current
(daily, 3a897f8a-0712-4701-a675-07f0670d8f87) currently has
no implementation to call.

This is a narrow, well-scoped theme — a good second-rebuild after AG1
to exercise the pattern on a non-LLM-heavy process. Most of the work
here is observability + self-healing rebuild, not semantic judgment.

Template fills

  • {{THEME_ID}} = S3
  • {{THEME_NAME}} = search-index coverage verification
  • {{LAYER}} = Senate
  • {{LAYER_SLUG}} = senate
  • {{THEME_SLUG}} = search_index_coverage
  • {{CADENCE}} = hourly
  • {{CORE_JUDGMENT}} = "is any search index materially stale or
out-of-sync with its source table?"
  • {{GAP_PREDICATE}} = (source_row_count - indexed_row_count) /
source_row_count > drift_threshold OR last_rebuilt_at < now() -
stale_threshold
— both thresholds from theme_config.

Where LLMs fit here (narrow)

Most of this theme is deterministic (count rows, compare, rebuild).
The LLM touch point: when a rebuild fails unexpectedly, an LLM
judges the failure message and proposes a remediation (missing
extension, permission error, tsvector config mismatch, etc.) before
escalating to a human. This keeps the process self-healing for the
common failure modes.

Self-describing registry

No hardcoded list of (source_table, index_name) pairs in code.
Instead, introspect PG:

  • pg_indexes filtered to indexname LIKE '%_fts' OR '%_embed' OR
'%_tsvector'.
  • Pair each with its backing source table by naming convention
(discovered, not hardcoded).
  • Operators add new indexes by creating them in PG with the naming
convention; the coverage checker picks them up automatically.

This is principle #2 (discover schema) in pure form.

Outcome feedback

  • Search-usage metrics: which queries return results after rebuild,
which still return empty. If a query-category consistently returns
empty even after rebuild, it's a ranking/embedding-model issue, not
a coverage issue — flag for a different theme.
  • Drift rate over time: if drift accumulates faster than hourly cycles
can absorb, cadence should adapt upward (self-calibration).

Acceptance

All template criteria, plus:

☐ No hardcoded list of indexes. Uses pg_indexes introspection.
☐ Drift threshold + stale threshold in theme_config.
☐ Recurring task 3a897f8a-0712-4701-a675-07f0670d8f87 reassigned
to this process.
☐ Self-healing: a rebuild that fails triggers the LLM-remediation
path (one bounded retry with LLM-suggested fix) before
escalating.

File: rebuild_theme_S3_search_index_coverage_spec.md
Modified: 2026-04-25 22:00
Size: 3.0 KB