[Economics] World model health metrics

← All Specs

[Economics] World model health metrics

Task ID: q06-b3-18A23BD8 Layer: Senate Priority: 85 Type: one-time

Goal

Build a comprehensive health dashboard that measures how well SciDEX's world model (knowledge graph + wiki + hypotheses + papers) represents scientific reality. The dashboard provides quantitative metrics across four dimensions: coverage (what fraction of known entities/relationships do we capture?), consistency (do hypotheses contradict each other?), freshness (when was each entity last updated?), and depth (edges per entity, citations per hypothesis).

This enables the Senate layer to monitor world model quality, identify gaps, and prioritize improvements. The dashboard should be accessible at /senate or /atlas.

Acceptance Criteria

☑ Implement coverage metrics
- [x] Count total entities in KG by type (gene, protein, disease, etc.)
- [x] Calculate relationship coverage (edges per entity)
- [x] Measure literature coverage (papers linked to entities)
☑ Implement consistency metrics
- [x] Detect contradicting hypotheses (same target, opposing mechanisms)
- [x] Flag conflicting evidence within analyses
☑ Implement freshness metrics
- [x] Track last update time for entities
- [x] Identify stale entities (>90 days without updates)
- [x] Measure average recency of evidence citations
☑ Implement depth metrics
- [x] Calculate edges per entity (mean, median, distribution)
- [x] Count citations per hypothesis
- [x] Measure analysis depth (debate rounds, tool invocations)
☑ Create dashboard UI
- [x] Build /senate route handler in api.py
- [x] Generate HTML with metrics visualization
- [x] Add summary statistics and charts
- [x] Link to detailed entity/hypothesis pages
☑ Test dashboard loads and displays correct metrics

Approach

1. Database Queries

Build functions to extract metrics from PostgreSQL:
  • Query knowledge_edges for entity relationship counts
  • Query hypotheses for scoring and evidence data
  • Query analyses for last update timestamps
  • Query papers for citation coverage
  • Join across tables to compute composite metrics

2. Metrics Module

Create senate.py with functions:

def compute_coverage_metrics(conn):
    """Coverage: entities, relationships, papers."""
    
def compute_consistency_metrics(conn):
    """Consistency: contradictions, conflicts."""
    
def compute_freshness_metrics(conn):
    """Freshness: last updates, stale entities."""
    
def compute_depth_metrics(conn):
    """Depth: edges per entity, citations per hypothesis."""
    
def get_world_model_health():
    """Aggregate all metrics into dashboard data."""

3. Dashboard Route

Add /senate handler to api.py:
  • Call senate.get_world_model_health()
  • Generate HTML with metrics tables and charts
  • Style consistent with existing SciDEX pages
  • Add navigation links to other pages

4. Testing

  • Run metrics queries against current database
  • Verify counts match expected entity/hypothesis totals
  • Test /senate route returns 200 and renders properly
  • Check performance with large result sets

Work Log

2026-04-02 02:55 PT — Slot 1

  • Started task
  • Created spec file
  • Read existing code (api.py, database schema, nav_html)
  • Examined database: 40 analyses, 142 hypotheses, 7953 KG edges, 1048 wiki entities

2026-04-02 03:15 PT — Slot 1

  • Implemented senate.py metrics module with functions:
- compute_coverage_metrics() — Entity counts, KG edges, papers, wiki coverage
- compute_consistency_metrics() — Detect contradicting hypotheses by target gene
- compute_freshness_metrics() — Track last update times, identify stale entities
- compute_depth_metrics() — Edges per entity, citations per hypothesis, debate depth
- get_world_model_health() — Aggregate all metrics with overall health score
  • Tested module: Successfully computes all metrics (501 unique papers, 100% KG coverage, 0 conflicts)
  • Fixed evidence format handling (dict with "pmid" key vs PMID: string)
  • Fixed datetime deprecation warnings (now using timezone.utc)

2026-04-02 03:30 PT — Slot 1

  • Replaced /senate route in api.py with new World Model Health Dashboard
  • New dashboard displays:
- Overall health score (0-100) with color-coded gauge
- Coverage: entity counts by type, KG edges, papers, wiki pages
- Consistency: genes with multiple hypotheses, conflicting targets
- Freshness: entities updated in last 30/60/90 days, stale entity list
- Depth: edges per entity distribution, top connected entities, most cited hypotheses
  • Verified Python syntax: ✓
  • Status: Implementation complete, ready to commit

2026-04-25 20:06 PDT — Codex Slot 53

  • Performed staleness review against current origin/main (7a48a2eef) and current task branch state.
  • Verified the claimed April 2 implementation is not present on current main: /senate exists, but there is no world-model-health module or dashboard covering coverage/consistency/freshness/depth across KG + wiki + hypotheses + papers.
  • Read current Senate/dashboard code in api.py, scidex/senate/metrics.py, and related planning docs to keep the fix aligned with existing patterns.
  • Planned implementation: add a dedicated scidex.senate.world_model_health metrics module, expose a /senate/world-model-health dashboard, and link it from /senate with targeted tests.

2026-04-25 20:22 PDT — Codex Slot 53

  • Implemented scidex/senate/world_model_health.py to compute coverage, consistency, freshness, and depth metrics directly from PostgreSQL tables (knowledge_edges, wiki_pages, hypotheses, papers, debate_sessions, optional tool_invocations).
  • Added /senate/world-model-health in api.py and a summary card/link on the main /senate dashboard.
  • Added regression coverage in tests/test_world_model_health.py for contradiction detection and the new route.
  • Verified with python3 -m py_compile scidex/senate/world_model_health.py api.py tests/test_world_model_health.py, pytest -q tests/test_world_model_health.py, direct wmh.get_world_model_health() against the live DB, and an in-process GET returning HTTP 200 for /senate/world-model-health.

2026-04-25 20:43 PDT — Codex Slot 53

  • Added a 5-minute in-process TTL cache to scidex.senate.world_model_health.get_world_model_health() so repeated /senate and /senate/world-model-health hits reuse the same aggregated payload instead of re-running the full metric suite each time.
  • Added a regression test covering cache reuse and re-ran live timing checks to confirm the dashboard remains correct while repeated requests get faster.

2026-04-25 21:10 PDT — Codex Slot 53

  • Rebased the intended Senate world-model-health feature onto current origin/main after the prior task branch was found to be carrying a stale api.py diff that removed portfolio endpoints.
  • Rebuilt api.py from current main and re-applied only the intended changes: /senate summary card, /senate/world-model-health dashboard route, and Senate landing-page link.
  • Re-verified targeted coverage with python3 -m py_compile api.py scidex/senate/world_model_health.py tests/test_world_model_health.py, pytest -q tests/test_world_model_health.py, live get_world_model_health(force_refresh=True), and in-process GET 200 for /senate/world-model-health.

File: q06-b3-18A23BD8_spec.md
Modified: 2026-04-28 03:24
Size: 7.3 KB