[Senate] Backfill historical resource usage from debate_rounds
Task ID: 4acdb148-2985-41d9-aeb7-161a16f03d8d
Layer: Senate
Priority: 83
Type: one-time
Goal
Populate the resource_usage table with historical token consumption data from existing debate_rounds and agent_performance tables. This enables cost tracking, per-hypothesis economics, and resource trend analysis across SciDEX's operational history.
The resource_usage table already exists. We need to backfill approximately:
- 56 debate rounds with 107,342 total tokens
- 108 agent_performance records (some with token data)
Acceptance Criteria
☑ All debate_rounds with tokens_used > 0 have corresponding resource_usage entries
☑ All agent_performance entries with tokens_used > 0 have resource_usage entries
☑ Entity type correctly set: 'analysis' for debate rounds, 'agent_run' for agent performance
☑ Resource subtype includes model_id from debate rounds
☑ Script is idempotent (can run multiple times without duplicates)
☑ Summary view shows cumulative resource consumption over time
☑ Verification query shows token totals match source tables
Approach
Read existing data — Query debate_rounds and agent_performance to understand data volume
Write backfill script — Create scripts/backfill_resource_usage.py:
- Join debate_rounds → debate_sessions → analyses to get analysis_id
- Insert resource_usage entries with entity_type='analysis', resource_type='llm_tokens'
- Process agent_performance entries with entity_type='agent_run'
- Use INSERT OR IGNORE to handle idempotency
- Track and report rows inserted
Run backfill — Execute script and verify results
Create summary view — SQL view for cumulative token trends
Test — Verify totals match, run sample queriesWork Log
2026-04-25 20:10 PT — Slot 0 (minimax:70)
- Staleness review: task valid, previous attempt did not merge (no task_runs row)
- Rebased on origin/main (was clean)
- Investigated DB schema: debate_rounds has tokens_used, agent_performance has tokens_used (no model_id)
- Backfill script: scripts/backfill_resource_usage.py
- 447 debate round groups (5.7M tokens) → resource_usage
- 1622 agent_performance rows (5.6M tokens) → resource_usage
- 416 analyses updated with total_cost_usd
- 1239 hypotheses updated with estimated_cost_usd
- resource_usage_cumulative materialized view created (21 days)
- Committed: 831da6586
- Result: Done — backfill complete, committed and ready to push