[Senate] Backfill historical resource usage from debate_rounds

Task ID: 4acdb148-2985-41d9-aeb7-161a16f03d8d Layer: Senate Priority: 83 Type: one-time

Goal

Populate the resource_usage table with historical token consumption data from existing debate_rounds and agent_performance tables. This enables cost tracking, per-hypothesis economics, and resource trend analysis across SciDEX's operational history.

The resource_usage table already exists. We need to backfill approximately:

56 debate rounds with 107,342 total tokens
108 agent_performance records (some with token data)

Acceptance Criteria

☑ All debate_rounds with tokens_used > 0 have corresponding resource_usage entries

☑ All agent_performance entries with tokens_used > 0 have resource_usage entries

☑ Entity type correctly set: 'analysis' for debate rounds, 'agent_run' for agent performance

☑ Resource subtype includes model_id from debate rounds

☑ Script is idempotent (can run multiple times without duplicates)

☑ Summary view shows cumulative resource consumption over time

☑ Verification query shows token totals match source tables

Approach

Read existing data — Query debate_rounds and agent_performance to understand data volume

Write backfill script — Create scripts/backfill_resource_usage.py:

- Join debate_rounds → debate_sessions → analyses to get analysis_id
- Insert resource_usage entries with entity_type='analysis', resource_type='llm_tokens'
- Process agent_performance entries with entity_type='agent_run'
- Use INSERT OR IGNORE to handle idempotency
- Track and report rows inserted

Run backfill — Execute script and verify results

Create summary view — SQL view for cumulative token trends

Test — Verify totals match, run sample queries

Work Log

2026-04-25 20:10 PT — Slot 0 (minimax:70)

Staleness review: task valid, previous attempt did not merge (no task_runs row)
Rebased on origin/main (was clean)
Investigated DB schema: debate_rounds has tokens_used, agent_performance has tokens_used (no model_id)
Backfill script: scripts/backfill_resource_usage.py

- 447 debate round groups (5.7M tokens) → resource_usage
- 1622 agent_performance rows (5.6M tokens) → resource_usage
- 416 analyses updated with total_cost_usd
- 1239 hypotheses updated with estimated_cost_usd
- resource_usage_cumulative materialized view created (21 days)

Committed: 831da6586
Result: Done — backfill complete, committed and ready to push

File: 4acdb148_senate_backfill_resource_usage_spec.md

Modified: 2026-04-25 23:40

Size: 2.5 KB