Effort: thorough
scidex/exchange/elo_ratings.py and scidex/senate/judge_elo.py already
maintain Elo for hypotheses and judges; nothing maintains an Elo *for the
personas themselves*. We have ~50 personas under personas/ (9 founding
plus dozens of scientists + philosophers); some are clearly more
informative debaters than others, and we cannot tell which without a
ranking signal. Build a persona ladder: a continuously-updated Elo
rating per persona derived from their pairwise debate-round wins (using
the weighted verdict from q-debate-evidence-weighted-vote), with a
round-robin scheduler that ensures every persona meets every other
persona over a sliding 90-day window. Top of the ladder gets first-pick
into newly-spawned debates; bottom gets retired-pending-review.
personae (note: distinct from the existingjudge-meta arena in judge_elo.py:44) — implemented directly inpersona_ladder.py using PERSONA_ARENA = "personae".
scidex/senate/persona_ladder.py:record_pair_match(session_id, persona_a, persona_b, winner,
weight_multiplier=1.0) — uses the weighted verdict to score.leaderboard(window_days: int = 90, limit: int = 100) -> list[dict]coverage(window_days) -> dict (returns{matched_pairs: n, possible_pairs: m, coverage_ratio: …}).pick_next_pair(window_days) -> tuple[str, str] — returns thepython -m scidex.senate.persona_ladder backfill walksdebate_sessions with weighted_verdict_json populated andC(n, 2) pairwise matches per session.
POST /api/agora/persona_ladder/schedule_match accepts a topicscidex/agora/scidex_orchestrator.py:run_debate withpersona_ids=[a, b]) for the next pair.
persona-ladder-scheduler daily picks 3 pairspick_next_pair and schedules debates directly via the/agora/persona-ladder — full leaderboard withsenate_alerts proposing review;tests/test_persona_ladder.py: 9 tests covering backfill,elo_ratings.update_match_result — only difference is the newpick_next_pair is argmin(recent_match_count) + tiebreak on Elo;templates/agora/base.html.q-debate-evidence-weighted-vote — supplies the per-round weightedscidex/exchange/elo_ratings.py — Elo machinery.scidex.agents.select — persona pool source.q-persona-disagreement-scoreboard — high-Elo ↔ high-Elo pairs areq-debate-replay-cross-topic — ladder Elo helps the replay engine{
"completion_shas": [
"0166407d2"
],
"completion_shas_checked_at": ""
}