[Senate] Rebuild theme S8: belief evolution & convergence metrics as a continuous process

← All Specs

[Senate] Rebuild theme S8: belief evolution & convergence metrics as a continuous process

Rebuild spec — follow docs/planning/specs/rebuild_theme_template_spec.md first.

Theme anchor

  • Theme: S8 — Belief evolution, convergence metrics, replication
tracking
  • Full description: docs/design/retired_scripts_patterns.md → S8

Why this matters now

Deleted scripts under this theme include convergence_monitor.py, convergence_metrics.py, belief_tracker.py, backfill_convergence_50.py. Open recurring tasks: [Senate] Knowledge growth metrics snapshot (every-12h, 05b6876b-61a9-4a49-8881-17e8db81746c) and the hourly 98c6c423-6d32-4090-ae73-6d6ddae3d155. Without this theme, the
site's longitudinal "are we getting closer to scientific truth?"
dashboard breaks.

Template fills

  • {{THEME_ID}} = S8
  • {{THEME_NAME}} = belief evolution + convergence metrics
  • {{LAYER}} = Senate
  • {{LAYER_SLUG}} = senate
  • {{THEME_SLUG}} = convergence
  • {{CADENCE}} = hourly for snapshots, daily for aggregates, weekly
for trend reports.
  • {{CORE_JUDGMENT}} = "given the time-series of hypothesis states
and evidence accumulations, is the system converging on specific
mechanisms, diverging into contradictions, or stalling? Which
disease-areas show the strongest convergence signal?"
  • {{GAP_PREDICATE}} = "has a snapshot been taken for time-bucket T
yet?" → simple if-not-exists insert.

Where LLMs are load-bearing

The snapshot-taking is deterministic SQL aggregation, NOT an LLM job.
LLMs come in for the analysis layer:

  • Narrative generation: given snapshot deltas over the last
  • week, an LLM writes "what changed this week and why?" for the
    dashboard.
  • Convergence-signal detection: given a cluster of hypotheses
  • about the same target, an LLM judges whether recent evidence has
    converged them, diverged them, or left them stalled.
  • Replication cluster keys: rather than hash(target+relation+
  • direction), an LLM rubric decides "are these two hypotheses
    testing the same underlying claim?" because surface forms
    differ ("TREM2 activates microglia" vs "microglial activation is
    driven by TREM2").

    No hardcoded KPI lists

    Scripts like backfill_convergence_50.py hardcoded "here are 50
    metrics to compute". The rebuild defines KPIs in a convergence_kpi
    table: (kpi_id, description, sql_expression, llm_rubric_id,
    enabled)
    . Operators add new KPIs via SQL; the snapshot worker picks
    them up next cycle.

    Each KPI has either a sql_expression (for deterministic metrics) or
    a llm_rubric_id (for semantic metrics like "disease coverage"). No
    code change to add a KPI.

    Outcome feedback

    • KPI informativeness: if a KPI's value is flat for > 30d, it's not
    tracking anything useful — propose disable.
    • Narrative quality: LLM narratives are self-scored for grounding
    (did each claim cite a snapshot-delta?); low-grounded narratives
    downgrade the narrative rubric.
    • Replication-cluster confusions: if an LLM-generated cluster is
    later split by an operator, the clustering rubric downgrades
    confidence for similar shapes.

    Acceptance

    All template criteria, plus:

    ☐ KPI registry in PG, not hardcoded list.
    ☐ Snapshot idempotency: same time-bucket cannot produce two rows.
    ☐ Narrative generator grounds every claim in a specific
    snapshot-delta, auditable.
    ☐ Replication clustering uses LLM judge, not string hash.
    ☐ Recurring tasks 05b6876b- and 98c6c423- reassigned.

    File: rebuild_theme_S8_belief_convergence_spec.md
    Modified: 2026-04-28 02:29
    Size: 3.5 KB