[Agora] Per-field tournament driver with fairness sampling + RD-aware scheduling done

← Open Questions as Ranked Artifacts
Glicko-2-aware pair scheduler: high-RD warmup + information-gain pairs + top-of-field consolidation, cold-start guarantee, 30-day pair lockout.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (4)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27
Squash merge: orchestra/task/2efd8ed4-per-field-tournament-driver-with-fairnes (2 commits) (#687)2026-04-27
[Agora] Update spec work log [task:2efd8ed4-98d7-4838-be65-d9453511e2f0]2026-04-27
[Agora] Per-field tournament scheduler: 3-strategy Glicko-2 pair selection, 30-day lockout, cold-start guarantee [task:2efd8ed4-98d7-4838-be65-d9453511e2f0]2026-04-27
Spec File

Goal

scidex/agora/open_question_tournament.py exposes run_tournament_batch(domains, pairs_per_field) and run_single_match(), but pair selection is uniform-random within field.
With wiki/debate/paper miners landing thousands of new open_questions
(other Q-OPENQ tasks), uniform sampling will starve high-RD newcomers and
re-grind already-converged top pairs. Build a Glicko-2-aware scheduler that
prioritizes pairs with high uncertainty reduction per match, ensures every
new question gets ≥5 matches in its first 7 days, and avoids re-judging the
same pair within a 30-day window.

Acceptance Criteria

☐ Replace run_tournament_batch pair-selection with a scheduler that
uses three weighted strategies:
1. High-RD warmup (40% of budget): sample one question with the
highest RD in field, pair with a random opponent within ±200 Elo.
2. Information-gain pair (40%): pick pairs where expected score
is closest to 0.5 (most informative under Glicko-2) and combined
RD is highest.
3. Top-of-field consolidation (20%): pair the top-10 by Elo
against each other to settle the leaderboard head.
☐ No pair (a,b) may be re-judged within 30 days unless either side's
RD has grown ≥10 since the last match. Persist last-match timestamps
per pair in a new table open_question_match_history(question_a_id,
question_b_id, judged_at, winner, judge_persona)
.
☐ Cold-start guarantee: any open_question with
created_at > NOW() - INTERVAL '7 days' AND n_matches<5 is
force-included in the next batch until satisfied.
☐ New CLI python -m scidex.agora.open_question_tournament
--schedule-batch --field neurodegeneration --pairs 50 runs the new
scheduler.
☐ Tournament status endpoint at api.py:79192 (get_tournament_status)
gains fields: cold_start_questions_count, next_high_info_pairs[],
field_rd_distribution.
☐ Pytest: scheduler emits the right strategy mix; cold-start guarantee;
30-day pair lockout (and the RD-grew-by-10 escape hatch); reproducible
with seeded RNG.
☐ Migration migrations/openq_match_history.sql creates the new table
with (question_a_id, question_b_id) symmetric uniqueness via
sorted-ids invariant.

Approach

  • Read existing run_tournament_batch and field_leaderboard in
  • scidex/agora/open_question_tournament.py to understand current loop.
  • Reuse Glicko-2 RD reads from the existing judge_elo_ratings table
  • (or whatever the actual table name is — derive from
    _get_or_init_rating() in open_question_tournament.py:189).
  • Implement scheduler as pure function over (field, candidate_questions,
  • match_history) so it's deterministically testable.
  • Existing nightly cron entry that invokes run_tournament_batch is left
  • in place — it now calls the new scheduler internally.

    Dependencies

    • 47ee9103-ccc0 — base Elo tournament (consumer)
    • q-openq-mine-from-wiki-pages and siblings — provide the population that
    needs cold-start matches

    Work Log

    2026-04-27 12:30 UTC — Slot 0

    • Implemented _fairness_scheduler() with three weighted strategies:
    High-RD warmup (40%), Information-gain (40%), Top consolidation (20%)
    • Added match history helpers: _record_match_history, _is_pair_locked,
    _get_last_match_for_pair, _get_match_count, _is_cold_start,
    _get_cold_start_questions
    • 30-day pair lockout with RD-growth escape hatch (≥10 RD increase)
    • Cold-start guarantee: questions <7 days old with <5 matches are force-included
    • Updated run_single_match to call _record_match_history after each match
    • Updated run_tournament_batch with seed parameter for reproducibility
    • Updated get_tournament_status with new fields: cold_start_questions_count,
    next_high_info_pairs, field_rd_distribution
    • Added CLI entry point: python -m scidex.agora.open_question_tournament run-batch|status
    • Created migration 036: open_question_match_history table with symmetric UNIQUE
    • Created tests/test_openq_tournament_scheduler.py with pytest suite
    • Verified: reproducibility (seeded RNG), cold-start force-inclusion, 30-day lockout,
    RD escape hatch, get_tournament_status fields
    • Committed and pushed: 20832ece2

    Sibling Tasks in Quest (Open Questions as Ranked Artifacts) ↗