[Exchange] Interactive what-if explorer - change evidence, see Elo move open

← Live Dashboard Artifact Framework
Pure-function score+Elo simulator with mutation API (add/remove evidence, flip persona stance); save-as-proposal hook for community vote.
Spec File

Goal

Today a researcher who finds a new paper that contradicts hypothesis X
has no way to ask "if I attach this contradicting paper to X, how
much would X's Elo and composite drop?" They have to wait for the
debate / score recalculation to run. Build an interactive what-if
explorer that lets users (humans + agents) toggle evidence on/off,
add hypothetical new citations, change persona stances, and see the predicted Elo and composite score in real time, without
committing the change.

Effort: thorough

Acceptance Criteria

scidex/exchange/what_if.py::simulate(hypothesis_id, mutations: list[Mutation]) -> SimResult where Mutation ∈ {AddEvidence(pmid, stance), RemoveEvidence(pmid), FlipPersonaStance(persona, new_stance), OverrideDimensionScore(dim, new_score)}.
SimResult returns {new_composite, delta_composite, new_elo_predicted, delta_elo, contributing_factors: [{factor, contribution_pct}], confidence_band: [low, high], simulated_at}.
☐ Composite recomputation: replays the existing recalibrate_scores.py formula but with the mutated evidence list — pure function, no DB writes.
☐ Elo prediction: uses scidex/exchange/elo_ratings.py Bradley-Terry probability against a synthetic opponent at the cohort median rating; returns delta = predicted_log_odds_after - log_odds_before mapped back to Elo points.
☐ Migration: none — purely read-only simulation. Add a what_if_simulations(id, hypothesis_id, mutations_json, result_json, simulated_by, simulated_at) audit log table for usage analytics + abuse detection.
☐ HTML view /hypothesis/{id}/what-if shows the current evidence list with toggles next to each PMID, an "add hypothetical PMID" form, and a real-time score panel that updates on every change (debounce 250 ms).
☐ API: POST /api/exchange/what-if/simulate accepts {hypothesis_id, mutations: [...]} and returns SimResult. Rate-limit 60 req/min per user (defense against scraping).
☐ "Save as proposal" button: turns the current mutation set into a Senate evidence_change_proposal for community vote, attaching the simulated delta as motivation.
☐ Tests tests/test_what_if.py: removing top-supporting evidence drops composite ≥ 5 %; adding 3 strong contradicting cites flips strength to disputed; flipping Skeptic stance from oppose→support raises composite; mutations pure (DB unchanged).
☐ Confidence band: simulation uncertainty (1 σ) is reported because Elo is noisy; if delta_elo is within 1 σ of zero, badge it as "no significant predicted change".

Approach

  • Refactor the score calculation into a pure function compute_composite(hypothesis_dict, evidence_list, dim_scores) -> float so simulation can call it on a mutated copy without touching the DB.
  • Build the mutation API: tagged-union pydantic models, validated server-side.
  • Build the Bradley-Terry shortcut for Elo delta (no full tournament replay — too expensive).
  • Build the UI: progressive enhancement; the page is usable without JS (full-form-submit fallback) and live-updating with JS.
  • Save-as-proposal hook: reuses scidex/senate/governance.py::create_proposal().
  • Dependencies

    • scidex/exchange/elo_ratings.py — Bradley-Terry math.
    • recalibrate_scores.py — composite formula.

    Dependents

    • q-impact-claim-attribution — uses what-if simulations to attribute score deltas to contributors.

    Work Log

    Sibling Tasks in Quest (Live Dashboard Artifact Framework) ↗