SciDEX — Task: [Agora] Per-field tournament driver with fairness

Glicko-2-aware pair scheduler: high-RD warmup + information-gain pairs + top-of-field consolidation, cold-start guarantee, 30-day pair lockout.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (4)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27

Squash merge: orchestra/task/2efd8ed4-per-field-tournament-driver-with-fairnes (2 commits) (#687)2026-04-27

[Agora] Update spec work log [task:2efd8ed4-98d7-4838-be65-d9453511e2f0]2026-04-27

[Agora] Per-field tournament scheduler: 3-strategy Glicko-2 pair selection, 30-day lockout, cold-start guarantee [task:2efd8ed4-98d7-4838-be65-d9453511e2f0]2026-04-27

Spec File

Goal

scidex/agora/open_question_tournament.py exposes run_tournament_batch(domains, pairs_per_field) and run_single_match(), but pair selection is uniform-random within field.
With wiki/debate/paper miners landing thousands of new open_questions
(other Q-OPENQ tasks), uniform sampling will starve high-RD newcomers and
re-grind already-converged top pairs. Build a Glicko-2-aware scheduler that
prioritizes pairs with high uncertainty reduction per match, ensures every
new question gets ≥5 matches in its first 7 days, and avoids re-judging the
same pair within a 30-day window.

Acceptance Criteria

☐ Replace run_tournament_batch pair-selection with a scheduler that

uses three weighted strategies:
1. High-RD warmup (40% of budget): sample one question with the
highest RD in field, pair with a random opponent within ±200 Elo.
2. Information-gain pair (40%): pick pairs where expected score
is closest to 0.5 (most informative under Glicko-2) and combined
RD is highest.
3. Top-of-field consolidation (20%): pair the top-10 by Elo
against each other to settle the leaderboard head.

☐ No pair (a,b) may be re-judged within 30 days unless either side's

RD has grown ≥10 since the last match. Persist last-match timestamps
per pair in a new table

open_question_match_history(question_a_id,
      question_b_id, judged_at, winner, judge_persona)

☐ Cold-start guarantee: any open_question with

created_at > NOW() - INTERVAL '7 days' AND n_matches<5 is
force-included in the next batch until satisfied.

☐ New CLI python -m scidex.agora.open_question_tournament


      --schedule-batch --field neurodegeneration --pairs 50

runs the new
scheduler.

☐ Tournament status endpoint at api.py:79192 (get_tournament_status)

gains fields: cold_start_questions_count, next_high_info_pairs[],
field_rd_distribution.

☐ Pytest: scheduler emits the right strategy mix; cold-start guarantee;

30-day pair lockout (and the RD-grew-by-10 escape hatch); reproducible
with seeded RNG.

☐ Migration migrations/openq_match_history.sql creates the new table

with (question_a_id, question_b_id) symmetric uniqueness via
sorted-ids invariant.

Approach

Read existing run_tournament_batch and field_leaderboard in

scidex/agora/open_question_tournament.py to understand current loop.

Reuse Glicko-2 RD reads from the existing judge_elo_ratings table

(or whatever the actual table name is — derive from
_get_or_init_rating() in open_question_tournament.py:189).

Implement scheduler as pure function over (field, candidate_questions,


   match_history)

so it's deterministically testable.

Existing nightly cron entry that invokes run_tournament_batch is left

in place — it now calls the new scheduler internally.

Dependencies

47ee9103-ccc0 — base Elo tournament (consumer)
q-openq-mine-from-wiki-pages and siblings — provide the population that

needs cold-start matches

Work Log

2026-04-27 12:30 UTC — Slot 0

Implemented _fairness_scheduler() with three weighted strategies:

High-RD warmup (40%), Information-gain (40%), Top consolidation (20%)

Added match history helpers: _record_match_history, _is_pair_locked,

_get_last_match_for_pair, _get_match_count, _is_cold_start,
_get_cold_start_questions

30-day pair lockout with RD-growth escape hatch (≥10 RD increase)
Cold-start guarantee: questions <7 days old with <5 matches are force-included
Updated run_single_match to call _record_match_history after each match
Updated run_tournament_batch with seed parameter for reproducibility
Updated get_tournament_status with new fields: cold_start_questions_count,

next_high_info_pairs, field_rd_distribution

Added CLI entry point: python -m scidex.agora.open_question_tournament run-batch|status
Created migration 036: open_question_match_history table with symmetric UNIQUE
Created tests/test_openq_tournament_scheduler.py with pytest suite
Verified: reproducibility (seeded RNG), cold-start force-inclusion, 30-day lockout,

RD escape hatch, get_tournament_status fields