[Senate] Implement resource tracking and metering
Task ID: 370890cc-17a6-4afc-803c-f3219625c49d
Goal
Implement comprehensive resource tracking across SciDEX to measure actual costs per hypothesis, analysis, and entity. This enables the Senate layer to govern resource allocation, identify inefficient operations, and calculate true cost per insight. Track LLM tokens, API calls, and compute time, storing all metrics in a centralized resource_usage table.
Acceptance Criteria
☐ Token counting added to all LLM calls in scidex_orchestrator.py
☐ API call tracking added in tools.py (PubMed, Semantic Scholar, etc.)
☐ CPU time tracking added for post_process.py pipeline runs
☐ resource_usage table created with fields: entity_type, entity_id, resource_type, amount, cost_usd, created_at
☐ /api/resource-usage/{entity_id} endpoint implemented
☐ Resource usage stats displayed on hypothesis detail pages
☐ Resource usage dashboard added to Senate page (/senate)
☐ All changes tested and verified working
Approach
Create database schema
- Add resource_usage table to PostgreSQL
- Schema: id, entity_type (analysis/hypothesis/gap), entity_id, resource_type (llm_tokens/api_call/cpu_seconds), amount, cost_usd, metadata (JSON), created_at
Instrument LLM calls in scidex_orchestrator.py
- Wrap all Anthropic API calls to capture input/output tokens
- Calculate cost: Sonnet $3/1M input, $15/1M output; Haiku $0.25/1M input, $1.25/1M output
- Store with entity_type=analysis, entity_id=analysis_id
Instrument API calls in tools.py
- Add tracking to PubMed, Semantic Scholar, Protein Data Bank, etc.
- Count calls and log to resource_usage with entity context
- Track rate limits and failures
Add CPU time tracking to post_process.py
- Use time.process_time() to measure pipeline stages
- Store total CPU time per analysis
Implement API endpoint
- GET /api/resource-usage/{entity_id} returns all resource records
- Aggregate by resource_type, sum amounts and costs
Update UI
- Add resource stats to hypothesis detail pages (token count, API calls, cost)
- Create /senate dashboard showing:
- Total costs across all analyses
- Most expensive hypotheses
- Token efficiency metrics
- API call breakdown
Test
- Run a new analysis and verify tracking
- Check database records
- Verify API endpoint returns correct data
- Confirm UI displays stats
Work Log
2026-04-01 — Started task
- Received task from Orchestra
- Created spec file
- Beginning implementation
2026-04-25 15:05 PT — Codex
- Re-reviewed current mainline state before coding. Confirmed
resource_usage already exists in PostgreSQL and partial metering is live, but the stack is inconsistent:
scidex/agora/scidex_orchestrator.py meters LLM/tool usage, while scidex/forge/tools.py still lacks first-class resource metering in the tool wrapper.
post_process.py captures start_cpu_time but does not persist runtime into resource_usage.
/api/resource-usage/... still serves legacy resource_cost/ROI data instead of rows and aggregates from resource_usage.
- Hypothesis detail and Senate dashboard pages read stale resource type names (
llm_tokens, bedrock_tokens) and miss mixed old/new usage rows.
- Sandbox prevents
git fetch origin main in this harness because the shared worktree git metadata is read-only; proceeding with a targeted patch against the current checked-out tree.
2026-04-25 15:42 PT — Codex verification
python3 -m py_compile scidex/core/resource_tracker.py scidex/forge/tools.py scidex/agora/scidex_orchestrator.py post_process.py api.py tests/test_resource_usage_api.py passed.
PYTHONPATH=. pytest -q tests/test_resource_usage_api.py passed (1 passed).
- Live sanity check:
TestClient(api.app).get('/api/resource-usage/<analysis>', entity_type=analysis) returned HTTP 200 with the new summary + records payload shape.
- Live DB sanity check:
resource_tracker.get_resource_summary() now collapses mixed llm_tokens, llm_tokens_input/output, and process_runtime_seconds rows into stable UI buckets.