[Agora] Dynamic debate round count - stop when stability detected done

← Open Debates
RoundController stops/continues/escalates per round using semantic sim + verdict-vote stability + claim-novelty floor.

Completion Notes

Auto-release: work already on origin/main

Git Commits (1)

[Agora] Dynamic debate round controller — stability-aware termination [task:623c630c-1adc-422e-9df6-9b8cdec3e2a5] (#721)2026-04-27
Spec File

Effort: thorough

Goal

SciDEXOrchestrator.run_debate (agent.py:1420 + scidex/agora/scidex_orchestrator.py) and the artifact-debate fan-outs
all run a fixed number of rounds (commonly 4: theorist → skeptic →
domain_expert → synthesizer). Many debates converge by round 2; others
genuinely need 6+. A fixed N over-spends compute on easy questions and
under-resolves the hard ones. Build a stability-aware round controller
that watches the inter-round content for convergence (semantic similarity,
verdict-vote stability, no new claims emerging) and decides each round
whether to stop, continue, or escalate to a fresh persona — bounded by min_rounds and max_rounds.

Acceptance Criteria

☐ New module scidex/agora/round_controller.py:
class RoundController with should_continue(state) -> Decision,
where Decision ∈ {stop_converged, continue_baseline,
escalate_new_persona, stop_max_rounds, stop_safety}.
☐ Convergence signal = three orthogonal checks ANDed for stop:
1. Semantic stability: cosine sim of last-round content vs
prior round ≥ 0.9 (use existing scidex/core/embedding.py).
2. Verdict-vote stability: extracted per-round verdict (re-uses
aggregate_debate_consensus_dissent's consensus parser) is
identical to the previous round's verdict.
3. Claim-novelty floor: no new ≥-1-line claims emerged
(tokenised diff vs prior round; threshold 1 new claim).
Escalation triggered when 2 consecutive rounds show high
cosine sim BUT verdict is split — that's stable disagreement,
and the controller picks one new persona via the existing
specialist matcher (scidex.agents.select) to break the tie.
☐ Bounds: min_rounds = 2 (always), max_rounds = 8 (default;
configurable per-debate-type). Hard safety: stop when token budget
consumed > 80% of budget.
☐ Integration: agent.py:run_debate and
scidex/agora/scidex_orchestrator.py:1843 swap their fixed
for round_idx in range(num_rounds): loop for
while controller.should_continue(state) != stop_*:.
☐ New table debate_round_decisions(session_id TEXT, round_index INT,
decision TEXT, signal_json JSONB, decided_at TIMESTAMPTZ,
PRIMARY KEY (session_id, round_index))
. Migration
migrations/20260428_debate_round_decisions.sql.
debate_sessions.num_rounds continues to be the final round count
written at finish — interface is preserved.
☐ API: GET /api/agora/debate/{id}/round_decisions — full controller
trace for one debate; surfaced as a "Why did this debate stop at
round X?" widget on /debate/{id}.
☐ Tests tests/test_round_controller.py:
(a) all three signals satisfied at round 3 → stop_converged,
num_rounds=3.
(b) min_rounds floor: signals satisfied at round 1 → still goes
to round 2.
(c) escalation: split verdict + high sim → returns
escalate_new_persona, controller integrates the new persona.
(d) max_rounds ceiling: never converges → stops at 8.
(e) token-budget safety: ceiling fires before max_rounds.
☐ Shadow rollout: env var SCIDEX_DEBATE_DYNAMIC_ROUNDS=shadow
runs the controller alongside the fixed loop and writes its
counterfactual decision to debate_round_decisions without
changing actual round count. After 1 week, flip to enabled.
☐ Smoke: run 3 debates with the controller in shadow; verify each
has a debate_round_decisions trace; spot-check one decision
against the round content.

Approach

  • Read scidex/agora/scidex_orchestrator.py:1843 for the canonical
  • round-loop shape; introduce the controller as a thin object so the
    loop reads as while controller.next() == continue: ....
  • Embedding helper exists; semantic sim is one call. Verdict parser
  • already exists in synthesis_engine.py. Claim-novelty diff is
    tokenised set difference — keep simple.
  • Persistence first; integration after; rollout in shadow mode.
  • Add a cost_savings_estimate rollup to the dashboard so we can
  • prove the controller pays for itself.

    Dependencies

    • scidex/agora/synthesis_engine.py — verdict parser.
    • scidex/core/embedding.py — semantic sim.
    • scidex.agents.select — escalation persona picker.

    Dependents

    • q-debate-evidence-weighted-vote — voting layer benefits from
    stable-round signal as a confidence weight.
    • q-debate-judge-interruption — the judge's mid-round halt API and
    this controller share state plumbing.

    Work Log

    Payload JSON
    {
      "completion_shas": [
        "ebb12f65f44046b50d986be9a6067956779b1ae1"
      ],
      "completion_shas_checked_at": ""
    }

    Sibling Tasks in Quest (Open Debates) ↗

    Task Dependencies

    ↓ Referenced by (downstream)