[Senate] Backfill historical resource usage from debate_rounds

← All Specs

[Senate] Backfill historical resource usage from debate_rounds

Task ID: 4acdb148-2985-41d9-aeb7-161a16f03d8d Layer: Senate Priority: 83 Type: one-time

Goal

Populate the resource_usage table with historical token consumption data from existing debate_rounds and agent_performance tables. This enables cost tracking, per-hypothesis economics, and resource trend analysis across SciDEX's operational history.

The resource_usage table already exists. We need to backfill approximately:

  • 56 debate rounds with 107,342 total tokens
  • 108 agent_performance records (some with token data)

Acceptance Criteria

☑ All debate_rounds with tokens_used > 0 have corresponding resource_usage entries
☑ All agent_performance entries with tokens_used > 0 have resource_usage entries
☑ Entity type correctly set: 'analysis' for debate rounds, 'agent_run' for agent performance
☑ Resource subtype includes model_id from debate rounds
☑ Script is idempotent (can run multiple times without duplicates)
☑ Summary view shows cumulative resource consumption over time
☑ Verification query shows token totals match source tables

Approach

  • Read existing data — Query debate_rounds and agent_performance to understand data volume
  • Write backfill script — Create scripts/backfill_resource_usage.py:
  • - Join debate_rounds → debate_sessions → analyses to get analysis_id
    - Insert resource_usage entries with entity_type='analysis', resource_type='llm_tokens'
    - Process agent_performance entries with entity_type='agent_run'
    - Use INSERT OR IGNORE to handle idempotency
    - Track and report rows inserted
  • Run backfill — Execute script and verify results
  • Create summary view — SQL view for cumulative token trends
  • Test — Verify totals match, run sample queries
  • Work Log

    2026-04-25 20:10 PT — Slot 0 (minimax:70)

    • Staleness review: task valid, previous attempt did not merge (no task_runs row)
    • Rebased on origin/main (was clean)
    • Investigated DB schema: debate_rounds has tokens_used, agent_performance has tokens_used (no model_id)
    • Backfill script: scripts/backfill_resource_usage.py
    - 447 debate round groups (5.7M tokens) → resource_usage
    - 1622 agent_performance rows (5.6M tokens) → resource_usage
    - 416 analyses updated with total_cost_usd
    - 1239 hypotheses updated with estimated_cost_usd
    - resource_usage_cumulative materialized view created (21 days)
    • Committed: 831da6586
    • Result: Done — backfill complete, committed and ready to push

    File: 4acdb148_senate_backfill_resource_usage_spec.md
    Modified: 2026-04-25 23:40
    Size: 2.5 KB