SciDEX — Task: [Agora] Dynamic debate round count

RoundController stops/continues/escalates per round using semantic sim + verdict-vote stability + claim-novelty floor.

Completion Notes

Auto-release: work already on origin/main

Git Commits (1)

[Agora] Dynamic debate round controller — stability-aware termination [task:623c630c-1adc-422e-9df6-9b8cdec3e2a5] (#721)2026-04-27

Spec File

Effort: thorough

Goal

SciDEXOrchestrator.run_debate (agent.py:1420 + scidex/agora/scidex_orchestrator.py) and the artifact-debate fan-outs
all run a fixed number of rounds (commonly 4: theorist → skeptic →
domain_expert → synthesizer). Many debates converge by round 2; others
genuinely need 6+. A fixed N over-spends compute on easy questions and
under-resolves the hard ones. Build a stability-aware round controller
that watches the inter-round content for convergence (semantic similarity,
verdict-vote stability, no new claims emerging) and decides each round
whether to stop, continue, or escalate to a fresh persona — bounded by min_rounds and max_rounds.

Acceptance Criteria

☐ New module scidex/agora/round_controller.py:

class RoundController with should_continue(state) -> Decision,
where Decision ∈ {stop_converged, continue_baseline,
escalate_new_persona, stop_max_rounds, stop_safety}.

☐ Convergence signal = three orthogonal checks ANDed for stop:

1. Semantic stability: cosine sim of last-round content vs
prior round ≥ 0.9 (use existing scidex/core/embedding.py).
2. Verdict-vote stability: extracted per-round verdict (re-uses
aggregate_debate_consensus_dissent's consensus parser) is
identical to the previous round's verdict.
3. Claim-novelty floor: no new ≥-1-line claims emerged
(tokenised diff vs prior round; threshold 1 new claim).

☐ Escalation triggered when 2 consecutive rounds show high

cosine sim BUT verdict is split — that's stable disagreement,
and the controller picks one new persona via the existing
specialist matcher (scidex.agents.select) to break the tie.

☐ Bounds: min_rounds = 2 (always), max_rounds = 8 (default;

configurable per-debate-type). Hard safety: stop when token budget
consumed > 80% of budget.

☐ Integration: agent.py:run_debate and

scidex/agora/scidex_orchestrator.py:1843 swap their fixed
for round_idx in range(num_rounds): loop for
while controller.should_continue(state) != stop_*:.

☐ New table debate_round_decisions(session_id TEXT, round_index INT,


      decision TEXT, signal_json JSONB, decided_at TIMESTAMPTZ,
      PRIMARY KEY (session_id, round_index))

. Migration
migrations/20260428_debate_round_decisions.sql.

☐ debate_sessions.num_rounds continues to be the final round count

written at finish — interface is preserved.

☐ API: GET /api/agora/debate/{id}/round_decisions — full controller

trace for one debate; surfaced as a "Why did this debate stop at
round X?" widget on /debate/{id}.

☐ Tests tests/test_round_controller.py:

(a) all three signals satisfied at round 3 → stop_converged,
num_rounds=3.
(b) min_rounds floor: signals satisfied at round 1 → still goes
to round 2.
(c) escalation: split verdict + high sim → returns
escalate_new_persona, controller integrates the new persona.
(d) max_rounds ceiling: never converges → stops at 8.
(e) token-budget safety: ceiling fires before max_rounds.

☐ Shadow rollout: env var SCIDEX_DEBATE_DYNAMIC_ROUNDS=shadow

runs the controller alongside the fixed loop and writes its
counterfactual decision to debate_round_decisions without
changing actual round count. After 1 week, flip to enabled.

☐ Smoke: run 3 debates with the controller in shadow; verify each

has a debate_round_decisions trace; spot-check one decision
against the round content.

Approach

Read scidex/agora/scidex_orchestrator.py:1843 for the canonical

round-loop shape; introduce the controller as a thin object so the
loop reads as while controller.next() == continue: ....

Embedding helper exists; semantic sim is one call. Verdict parser

already exists in synthesis_engine.py. Claim-novelty diff is
tokenised set difference — keep simple.

Persistence first; integration after; rollout in shadow mode.

Add a cost_savings_estimate rollup to the dashboard so we can

prove the controller pays for itself.

Dependencies

scidex/agora/synthesis_engine.py — verdict parser.
scidex/core/embedding.py — semantic sim.
scidex.agents.select — escalation persona picker.

Dependents

q-debate-evidence-weighted-vote — voting layer benefits from

stable-round signal as a confidence weight.

q-debate-judge-interruption — the judge's mid-round halt API and

this controller share state plumbing.

Work Log

Payload JSON

{
  "completion_shas": [
    "ebb12f65f44046b50d986be9a6067956779b1ae1"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Open Debates) ↗

○[Agora] Agent debate enrollment driver (driver #1)P94

○[Agora] Multi-participant debate orchestration (driver #6)P94

○[Agora] Counter-argument bounty market (driver #7)P92

○[Agora] Dataset row-level debate gateway (driver #29)P91

✓[Agora] Add PubMed evidence to 11 hypotheses lacking citationsP90

✓[Agora] Evidence-weighted persona votes - citation density scales convictionP90

✓[Senate] Persona ladders - round-robin Elo tournament across personasP89

✓[Agora] Add spectator mode and real-time debate streaming to /debates pageP88

✓[Agora] Run debates for 10 analyses without debate sessionsP88

✓[Senate] Real-time judge interruption - halt rounds drifting off-topicP88

Task Dependencies

↓ Referenced by (downstream)

✓[Senate] Triage: [Agora] Dynamic debate round count - stop when stability detectedP70Senate

[Agora] Dynamic debate round count - stop when stability detected done