SciDEX — Task: [Senate] Persona ladders

Persona Elo arena fed by weighted-verdict pairwise wins; coverage-aware scheduler; daily 3-pair Orchestra tasks.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Senate] Persona ladder — round-robin Elo tournament across personas [task:8545fb83-fccc-44cf-b2e5-a29a7b2b7af3] (#749)2026-04-27

Spec File

Effort: thorough

Goal

scidex/exchange/elo_ratings.py and scidex/senate/judge_elo.py already
maintain Elo for hypotheses and judges; nothing maintains an Elo *for the
personas themselves*. We have ~50 personas under personas/ (9 founding
plus dozens of scientists + philosophers); some are clearly more
informative debaters than others, and we cannot tell which without a
ranking signal. Build a persona ladder: a continuously-updated Elo
rating per persona derived from their pairwise debate-round wins (using
the weighted verdict from q-debate-evidence-weighted-vote), with a
round-robin scheduler that ensures every persona meets every other
persona over a sliding 90-day window. Top of the ladder gets first-pick
into newly-spawned debates; bottom gets retired-pending-review.

Acceptance Criteria

☑ New Elo arena personae (note: distinct from the existing

judge-meta arena in judge_elo.py:44) — implemented directly in
persona_ladder.py using PERSONA_ARENA = "personae".

☑ Module scidex/senate/persona_ladder.py:

record_pair_match(session_id, persona_a, persona_b, winner,
         weight_multiplier=1.0)

— uses the weighted verdict to score.
- leaderboard(window_days: int = 90, limit: int = 100) -> list[dict]
- coverage(window_days) -> dict (returns
{matched_pairs: n, possible_pairs: m, coverage_ratio: …}).
- pick_next_pair(window_days) -> tuple[str, str] — returns the
pair with the fewest recent matches plus the highest combined
Elo (informativeness ⨯ data-need).

☑ Backfill: python -m scidex.senate.persona_ladder backfill walks

every debate_sessions with weighted_verdict_json populated and
records all C(n, 2) pairwise matches per session.

☑ Scheduler: a new admin endpoint

POST /api/agora/persona_ladder/schedule_match accepts a topic
and creates a 2-persona debate (using
scidex/agora/scidex_orchestrator.py:run_debate with
persona_ids=[a, b]) for the next pair.

☑ Recurring task: persona-ladder-scheduler daily picks 3 pairs

from pick_next_pair and schedules debates directly via the
orchestrator.

☑ HTML at /agora/persona-ladder — full leaderboard with

Elo trend sparkline per persona, current coverage % bar, and
"Next scheduled pair" widget.

☑ Retirement-pending-review: persona below 1300 Elo with ≥ 30

matches gets a row written to senate_alerts proposing review;
no auto-action.

☑ Tests tests/test_persona_ladder.py: 9 tests covering backfill,

leaderboard, coverage, pick_next_pair, retirement alerts, and full
workflow.

☑ Smoke: backfill against current DB; leaderboard shows non-trivial

Elo distribution across personas.

Approach

Reuse elo_ratings.update_match_result — only difference is the new

arena name.

pick_next_pair is argmin(recent_match_count) + tiebreak on Elo;

keep simple.

Scheduling matches as Orchestra tasks (rather than firing them

directly) lets the existing fleet pick them up with the appropriate
model effort.

HTML reuses templates/agora/base.html.

Dependencies

q-debate-evidence-weighted-vote — supplies the per-round weighted

winner.

scidex/exchange/elo_ratings.py — Elo machinery.
scidex.agents.select — persona pool source.

Dependents

q-persona-disagreement-scoreboard — high-Elo ↔ high-Elo pairs are

the most informative disagreements; ladder data feeds the scoreboard's
prior.

q-debate-replay-cross-topic — ladder Elo helps the replay engine

pick personas for new topics.

Work Log

Payload JSON

{
  "completion_shas": [
    "0166407d2"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Open Debates) ↗

○[Agora] Agent debate enrollment driver (driver #1)P94

○[Agora] Multi-participant debate orchestration (driver #6)P94

○[Agora] Counter-argument bounty market (driver #7)P92

○[Agora] Dataset row-level debate gateway (driver #29)P91

✓[Agora] Dynamic debate round count - stop when stability detectedP91

✓[Agora] Add PubMed evidence to 11 hypotheses lacking citationsP90

✓[Agora] Evidence-weighted persona votes - citation density scales convictionP90

✓[Agora] Add spectator mode and real-time debate streaming to /debates pageP88

✓[Agora] Run debates for 10 analyses without debate sessionsP88

✓[Senate] Real-time judge interruption - halt rounds drifting off-topicP88

[Senate] Persona ladders - round-robin Elo tournament across personas done