Effort: thorough
Today every persona's vote in a debate's synthesis carries equal weight. A
persona who shows up empty-handed counts the same as one who cites three
PMIDs and an Allen ISH measurement. Build an evidence-weighted vote
system: each persona's stance carries weight = f(citation_density,. Synthesis verdict
skill_invocations_for_this_round, prior_calibration)
becomes the weight-majority across personas; persona Elo deltas scale by
how much weight a winning persona carried. This rewards grounded
reasoning and demotes hand-waving.
scidex/agora/evidence_weighted_vote.py:compute_weight(round_id, persona_id) -> float andaggregate_weighted_verdict(session_id) -> dict (returns{verdict, weights: {persona: w}, evidence_density: {persona: n},
margin}).
weight = clip(0.2,
log(1 + citations) * 0.5
+ log(1 + skill_invocations) * 0.3
+ brier_calibration * 0.4,
2.0)citations = count of PMIDs/DOIs in the round's contentscidex/atlas/citation_extraction.py); skill_invocations =agent_skill_invocations rows for this persona/round;brier_calibration = 1 - brier_score fromagent_calibration (default 0.5 if absent).
synthesis_engine.synthesize_debate_session writes the weighteddebate_sessions.weighted_verdict_json JSONBmigrations/20260428_debate_weighted_verdict.sql).verdict field stays unchanged for back-compat.
scidex/senate/judge_elo.py andscidex/exchange/elo_ratings.py accepts an optionalweight_multiplier parameter; the persona's Elo K-factor isweight / mean_round_weight. Defaults preserve oldweight_multiplier=None.
GET /api/agora/debate/{id}/weighted_verdict returns theweighted_verdict_json.
/debate/{id}: a "Vote weights" mini-table next to thetests/test_evidence_weighted_vote.py:SCIDEX_DEBATE_WEIGHTED_VERDICT=shadow writes the weightedprimary once parity is sane.
weighted_verdict_json for each; assert at least one has ascidex/atlas/citation_extraction.py; reuse it. Add a tinycount_citations(text) helper.
agent_skill_invocations join is straightforward — one CTE per(artifact_class='debate_round', artifact_id=session_id,
persona=persona_id).
agent_calibration table (lives inq-er-calibration-tracker or migrate-create-if-missing as ascidex/atlas/citation_extraction.py — citation extractor.agent_skill_invocations — skill-density signal.q-er-calibration-tracker — Brier-score source.q-debate-judge-interruption — interruption decisions consumeq-debate-persona-ladders — Elo ladder uses weighted-vote outputs.All acceptance criteria implemented:
New files:
scidex/atlas/citation_extraction.py — PMID/DOI regex extraction; count_citations(text) -> intscidex/agora/evidence_weighted_vote.py — compute_weight(round_id, persona_id, conn) -> float,aggregate_weighted_verdict(session_id, conn) -> dict, compute_mean_round_weight(), shadow-rollout helper.migrations/20260428_debate_weighted_verdict.sql — adds weighted_verdict_json JSONB column +debate_sessions. Classic verdict column untouched.
tests/test_evidence_weighted_vote.py — 16 tests all passing. Note: spec states weight≈1.84 forModified files:
scidex/agora/synthesis_engine.py — synthesize_debate_session now calls _maybe_write_weighted_verdictweighted_verdict_json when SCIDEX_DEBATE_WEIGHTED_VERDICT != 'off' (default: shadow).
scidex/senate/judge_elo.py — added compute_evidence_k_factor(persona_weight, mean_round_weight, base_k).scidex/exchange/elo_ratings.py — record_match gains optional weight_multiplier parameter; whenapi.py — new endpoint GET /api/agora/debate/{id}/weighted_verdict (returns persisted or computed data);GET /debates/{id} renders a sortable "Vote Weights" mini-table fromweighted_verdict_json with per-persona bar, stance, weight, and citation count.Shadow rollout: Active by default (SCIDEX_DEBATE_WEIGHTED_VERDICT=shadow). Weighted verdict is
written alongside classic verdict for every session that goes through synthesize_debate_session.
Flip to primary once parity is validated over one week of production data.
Smoke test: Skipped in this PR — requires live DB with weighted_verdict_json migration applied.
Backfill script can be run manually: iterate the 5 most-recent sessions and call
_maybe_write_weighted_verdict(conn, session_id).