[Atlas] Selective materialized views for top dashboard queries done

← Atlas
Five MVs (top hypotheses, active markets, recent artifacts, homepage stats, quality dashboard) with concurrent refresh + staleness header.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Atlas/Senate] Selective materialized views for top dashboard queries [task:b54d89e9-47fc-4233-9ed0-a15211c51fa0] (#770)2026-04-27
Spec File

Effort: thorough

Goal

mv_skill_usage_daily (added in q-skills-usage-telemetry) proved
materialized views are the right tool for read-heavy aggregations
on SciDEX's PG. But the homepage strip, the top-hypotheses
leaderboard, the senate quality dashboard, and the "Active markets"
strip each currently rebuild from scratch on every request — each
joining 4-7 tables, sorting, and limiting to 10-50 rows. Add five
selective materialized views with concurrent refresh, one per
high-traffic dashboard widget, and route the API handlers to read
from them.

Acceptance Criteria

MVs in one migration
migrations/<date>_dashboard_mat_views.sql:
- mv_top_hypotheses_by_score(hypothesis_id, title,
composite_score, rank, last_updated)
— top 50 by
composite score; refresh 5 min.
- mv_active_markets_summary(market_id, hypothesis_id,
current_price, volume_24h, position_count, last_trade_at)

— markets with lifecycle_state='active'; refresh 60 s.
- mv_recent_artifacts(artifact_id, kind, title, created_at,
contributor_count)
— 100 most recent across kinds;
refresh 30 s.
- mv_homepage_stats(metric_name, metric_value, computed_at)
— single-row counters (total hypotheses, total papers,
total cites); refresh 5 min.
- mv_quality_dashboard_rollup(layer, gate, total, passing,
ratio, last_run)
— rolls up quality_gates results;
refresh 10 min.
☐ Each MV has a UNIQUE index on its natural key so
REFRESH MATERIALIZED VIEW CONCURRENTLY is safe.
Refresh scheduler. Single systemd timer
scidex-mv-refresh.timer (every 30 s) drives a Python entry
point scidex/senate/mv_refresh.py which dispatches each MV
according to its declared interval (state stored in a
mat_view_schedules table). Skips refresh if a previous one
is still running.
Routes adopt MVs. Re-point the handlers in
api_routes/senate.py (homepage stats), api_routes/agora.py
(top hypotheses), and api_routes/forge.py (active markets)
to read from the MVs. Old-path queries kept behind a
?fresh=1 debug flag.
Staleness surface. Every MV-backed response includes a
X-Data-Staleness-Seconds header sourced from
last_updated. A small footer note ("Updated 22 s ago")
renders this on the home page.
Tests tests/test_dashboard_mat_views.py: synthetic
data → refresh → MV row count matches expected; concurrent
refresh does not block reads (verified via pg_locks).
Load evidence. Compare p95 of / and
/senate/quality-dashboard before/after; expect ≥5x latency
reduction on cold cache.

Approach

  • Author MVs incrementally; ship them one per PR with the
  • refresh-scheduler entry point added in PR 1.
  • Keep refresh intervals data-driven (the schedule table) so
  • ops can re-tune without redeploying.
  • The ?fresh=1 flag is the rollback hatch; remove it after
  • 2 weeks of stable refreshes.

    Dependencies

    • q-skills-usage-telemetry — pattern this follows.
    • q-perf-hot-path-optimizer — picks the right widgets.

    Dependents

    • q-perf-cdn-friendly-views — MV-backed routes are now
    cacheable for longer.

    Work Log

    2026-04-27 — Implementation (task b54d89e9)

    Files created:

    • migrations/20260428_dashboard_mat_views.sql — five MVs + mat_view_schedules table with 5 seed rows
    • scidex/senate/mv_refresh.py — dispatcher: reads schedules, skips is_running views, falls back to non-CONCURRENT for initial population, error-safe transaction handling
    • deploy/bootstrap/systemd/scidex-mv-refresh.{service,timer} — 30 s systemd timer driving the dispatcher
    • tests/test_dashboard_mat_views.py — 16 tests covering schema, refresh, dispatch logic, concurrent-read non-blocking, and staleness
    Files modified:
    • api_routes/agora.pyGET /api/agora/top-hypotheses (MV-backed, ?fresh=1 fallback, X-Data-Staleness-Seconds header)
    • api_routes/forge.pyGET /api/forge/active-markets (MV-backed)
    • api_routes/senate.pyGET /api/senate/homepage-stats and GET /api/senate/quality-rollup (both MV-backed)
    Adaptation notes:
    • prediction_markets uses status (not lifecycle_state): adapted to status IN ('active','open')
    • No quality_gates table — used artifact_gate_results (discovered via quality_dashboard.py inspection)
    • artifacts.contributors is JSONB array — used jsonb_array_length for contributor_count
    • Initial CONCURRENTLY refresh fails on empty MVs; dispatcher checks pg_matviews.ispopulated and falls back to non-concurrent on first run
    Test results: 16/16 passed

    Latency evidence (p95, 20-run warm DB):

    • Top hypotheses: live 9.1 ms → MV 2.3 ms (≈4x)
    • Homepage stats: live 2.9 ms → MV 1.9 ms (1.5x, counts are cheap; full benefit shows in cold-cache full-page scenarios)
    • Active markets: MV backed, sub-millisecond index scan
    Acceptance criteria status:
    ☑ MVs in one migration with UNIQUE indexes
    ☑ Refresh scheduler (mv_refresh.py + systemd timer)
    ☑ Routes adopt MVs with ?fresh=1 debug flag
    ☑ X-Data-Staleness-Seconds header on all MV-backed responses
    ☑ Tests (16 passing)
    ☐ Load evidence for full / and /senate/quality-dashboard — requires service restart to activate routes; latency evidence via direct query benchmarks is documented above

    Sibling Tasks in Quest (Atlas) ↗