[Senate] Implement resource tracking and metering

Task ID: 370890cc-17a6-4afc-803c-f3219625c49d

Goal

Implement comprehensive resource tracking across SciDEX to measure actual costs per hypothesis, analysis, and entity. This enables the Senate layer to govern resource allocation, identify inefficient operations, and calculate true cost per insight. Track LLM tokens, API calls, and compute time, storing all metrics in a centralized resource_usage table.

Acceptance Criteria

☐ Token counting added to all LLM calls in scidex_orchestrator.py

☐ API call tracking added in tools.py (PubMed, Semantic Scholar, etc.)

☐ CPU time tracking added for post_process.py pipeline runs

☐ resource_usage table created with fields: entity_type, entity_id, resource_type, amount, cost_usd, created_at

☐ /api/resource-usage/{entity_id} endpoint implemented

☐ Resource usage stats displayed on hypothesis detail pages

☐ Resource usage dashboard added to Senate page (/senate)

☐ All changes tested and verified working

Approach

Create database schema

- Add resource_usage table to PostgreSQL
- Schema: id, entity_type (analysis/hypothesis/gap), entity_id, resource_type (llm_tokens/api_call/cpu_seconds), amount, cost_usd, metadata (JSON), created_at

Instrument LLM calls in scidex_orchestrator.py

- Wrap all Anthropic API calls to capture input/output tokens
- Calculate cost: Sonnet $3/1M input, $15/1M output; Haiku $0.25/1M input, $1.25/1M output
- Store with entity_type=analysis, entity_id=analysis_id

Instrument API calls in tools.py

- Add tracking to PubMed, Semantic Scholar, Protein Data Bank, etc.
- Count calls and log to resource_usage with entity context
- Track rate limits and failures

Add CPU time tracking to post_process.py

- Use time.process_time() to measure pipeline stages
- Store total CPU time per analysis

Implement API endpoint

- GET /api/resource-usage/{entity_id} returns all resource records
- Aggregate by resource_type, sum amounts and costs

Update UI

- Add resource stats to hypothesis detail pages (token count, API calls, cost)
- Create /senate dashboard showing:
- Total costs across all analyses
- Most expensive hypotheses
- Token efficiency metrics
- API call breakdown

Test

- Run a new analysis and verify tracking
- Check database records
- Verify API endpoint returns correct data
- Confirm UI displays stats

Work Log

2026-04-01 — Started task

Received task from Orchestra
Created spec file
Beginning implementation

2026-04-25 15:05 PT — Codex

Re-reviewed current mainline state before coding. Confirmed resource_usage already exists in PostgreSQL and partial metering is live, but the stack is inconsistent:
scidex/agora/scidex_orchestrator.py meters LLM/tool usage, while scidex/forge/tools.py still lacks first-class resource metering in the tool wrapper.
post_process.py captures start_cpu_time but does not persist runtime into resource_usage.
/api/resource-usage/... still serves legacy resource_cost/ROI data instead of rows and aggregates from resource_usage.
Hypothesis detail and Senate dashboard pages read stale resource type names (llm_tokens, bedrock_tokens) and miss mixed old/new usage rows.
Sandbox prevents git fetch origin main in this harness because the shared worktree git metadata is read-only; proceeding with a targeted patch against the current checked-out tree.

2026-04-25 15:42 PT — Codex verification

python3 -m py_compile scidex/core/resource_tracker.py scidex/forge/tools.py scidex/agora/scidex_orchestrator.py post_process.py api.py tests/test_resource_usage_api.py passed.
PYTHONPATH=. pytest -q tests/test_resource_usage_api.py passed (1 passed).
Live sanity check: TestClient(api.app).get('/api/resource-usage/<analysis>', entity_type=analysis) returned HTTP 200 with the new summary + records payload shape.
Live DB sanity check: resource_tracker.get_resource_summary() now collapses mixed llm_tokens, llm_tokens_input/output, and process_runtime_seconds rows into stable UI buckets.

File: 370890cc-17a6-4afc-803c-f3219625c49d_spec.md

Modified: 2026-04-26 01:01

Size: 4.1 KB