[Senate] Rebuild theme S8: belief evolution & convergence metrics as a continuous process

Rebuild spec — follow docs/planning/specs/rebuild_theme_template_spec.md first.

Theme anchor

Theme: S8 — Belief evolution, convergence metrics, replication

tracking

Full description: docs/design/retired_scripts_patterns.md → S8

Why this matters now

Deleted scripts under this theme include convergence_monitor.py, convergence_metrics.py, belief_tracker.py, backfill_convergence_50.py. Open recurring tasks: [Senate] Knowledge growth metrics snapshot (every-12h, 05b6876b-61a9-4a49-8881-17e8db81746c) and the hourly 98c6c423-6d32-4090-ae73-6d6ddae3d155. Without this theme, the
site's longitudinal "are we getting closer to scientific truth?"
dashboard breaks.

Template fills

{{THEME_ID}} = S8
{{THEME_NAME}} = belief evolution + convergence metrics
{{LAYER}} = Senate
{{LAYER_SLUG}} = senate
{{THEME_SLUG}} = convergence
{{CADENCE}} = hourly for snapshots, daily for aggregates, weekly

for trend reports.

{{CORE_JUDGMENT}} = "given the time-series of hypothesis states

and evidence accumulations, is the system converging on specific
mechanisms, diverging into contradictions, or stalling? Which
disease-areas show the strongest convergence signal?"

{{GAP_PREDICATE}} = "has a snapshot been taken for time-bucket T

yet?" → simple if-not-exists insert.

Where LLMs are load-bearing

The snapshot-taking is deterministic SQL aggregation, NOT an LLM job.
LLMs come in for the analysis layer:

Narrative generation: given snapshot deltas over the last

week, an LLM writes "what changed this week and why?" for the
dashboard.

Convergence-signal detection: given a cluster of hypotheses

about the same target, an LLM judges whether recent evidence has
converged them, diverged them, or left them stalled.

Replication cluster keys: rather than hash(target+relation+


   direction)

, an LLM rubric decides "are these two hypotheses
testing the same underlying claim?" because surface forms
differ ("TREM2 activates microglia" vs "microglial activation is
driven by TREM2").

No hardcoded KPI lists

Scripts like backfill_convergence_50.py hardcoded "here are 50
metrics to compute". The rebuild defines KPIs in a convergence_kpi
table: (kpi_id, description, sql_expression, llm_rubric_id, enabled). Operators add new KPIs via SQL; the snapshot worker picks
them up next cycle.

Each KPI has either a sql_expression (for deterministic metrics) or
a llm_rubric_id (for semantic metrics like "disease coverage"). No
code change to add a KPI.

Outcome feedback

KPI informativeness: if a KPI's value is flat for > 30d, it's not

tracking anything useful — propose disable.

Narrative quality: LLM narratives are self-scored for grounding

(did each claim cite a snapshot-delta?); low-grounded narratives
downgrade the narrative rubric.

Replication-cluster confusions: if an LLM-generated cluster is

later split by an operator, the clustering rubric downgrades
confidence for similar shapes.

Acceptance

All template criteria, plus:

☐ KPI registry in PG, not hardcoded list.

☐ Snapshot idempotency: same time-bucket cannot produce two rows.

☐ Narrative generator grounds every claim in a specific

snapshot-delta, auditable.

☐ Replication clustering uses LLM judge, not string hash.

☐ Recurring tasks 05b6876b- and 98c6c423- reassigned.

File: rebuild_theme_S8_belief_convergence_spec.md

Modified: 2026-04-28 02:29

Size: 3.5 KB