[Senate] World-model improvement detector (driver)

Task

ID: 428c719e-a95a-40ca-8d8c-cba13e2f60cf
Type: recurring
Frequency: every-6h
Layer: Senate
Priority: P96

Goal

Detect concrete improvements to the platform's world model — new hypotheses that raise Elo above a threshold, wiki pages that gain substantive citations, datasets that meaningfully reduce uncertainty, analyses that refute/confirm prior claims — and emit world_model_improvement events that the Economics v2 credit backprop pipeline uses as the root reward signal (see project_economics_v2_credit_backprop_2026-04-10).

What it does

Compute deltas since last_wmi_scan_ts across four signal families:

- Hypothesis Elo gains (rank change above noise floor, verified by multiple judges).
- Wiki page quality jumps (citation count delta, reader score delta, structure improvements).
- Dataset uncertainty reduction (variance of downstream predictions before/after merge).
- Analysis confirmations/refutations (debate consensus shifts, benchmark wins/losses).

For each delta that crosses its threshold, write a world_model_improvements row with source_ref, delta_magnitude, signal_family, contributor_graph (author + reviewers + upstream citations).
Run calibration: compare predicted improvements (from prediction markets) vs. realized — feed the residual into calibration_slashing driver.
Emit total daily world-model-improvement score to logs/wmi-latest.json and the /senate/world-model dashboard.

Success criteria

Every scan produces either >0 improvement events OR a logged why_quiet explanation (e.g. "no new hypotheses above threshold in window").
100% of emitted events carry a non-empty contributor_graph (input to PageRank credit backprop).
No event is emitted twice (UNIQUE(source_ref, signal_family)).
Threshold drift is tracked: if >50% of events cluster at the minimum threshold, Senate is alerted to retune.

Quality requirements

No stubs: output must be substantive — link to the meta-quest quest_quality_standards_spec.md. Events with magnitude = floor and no citation evidence are auto-rejected.
When operating on >=10 items, use 3-5 parallel agents each handling a disjoint slice (shard by signal_family).
Log total items processed + items that required retry so we can detect busywork.
Thresholds are stored in world_model_thresholds and versioned; changes require a Senate proposal.
Output feeds discovery-dividend math (44651656, 5531507e) — schema breaks must bump a version number.

Work Log

2026-04-23 03:30 PDT — Slot 51

Ran Driver #13 dry-run and then live: emitted 10 pending citation_threshold_medium world-model improvement rows.
Verified idempotency after the live run: follow-up dry-run reported no new citation/gap/promoted events.
Consistency audit found 4 high-confidence hypotheses (confidence_score >= 0.7) missing hypothesis_matured events while the driver reported no-op, caused by the confidence_growth_last_created_at cursor skipping older rows whose confidence crossed the threshold later.
Plan: remove mutable-signal created_at watermarks and rely on world_model_improvements DB-level existence checks for idempotency, then backfill the 4 missed rows.
Implemented the cursor removal for hypothesis_matured and hypothesis_promoted; both now scan for eligible artifacts missing improvement rows.
Ran Driver #13 live after the fix: emitted 4 pending hypothesis_matured rows for the missed high-confidence hypotheses.
Verified final state: follow-up dry-run no-op; 0 high-confidence hypotheses missing hypothesis_matured; 0 promoted hypotheses missing hypothesis_promoted.

2026-04-23 04:05 PDT — Codex

Reviewed in-progress Driver #13 patch and confirmed the mutable-signal cursor issue applies to resolved gaps, high-confidence hypotheses, and promoted hypotheses.
Added focused regression tests for older hypotheses whose confidence_score or status changes after cursor advancement.
Verified: pytest -q tests/test_detect_improvements.py -> 2 passed; python3 -m economics_drivers.detect_improvements --limit 20 --dry-run -> no-op.
DB consistency audit: 0 high-confidence hypotheses missing hypothesis_matured, 0 promoted hypotheses missing hypothesis_promoted, 0 resolved gaps missing gap_resolved.

File: 428c719e-a95a-40ca-8d8c-cba13e2f60cf_spec.md

Modified: 2026-04-25 23:40

Size: 4.3 KB