> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> AG4 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.
Quest: Agora Priority: P90 Status: open
Continuously score and triage debate quality so low-value, placeholder, and weak debates are either repaired, deprioritized, or excluded from downstream pricing and ranking loops.
This task is part of the Agora quest (Agora layer). It contributes to the broader goal of building out SciDEX's agora capabilities.
e4cb29bc-dc8b-45d0-b499-333d4d9037e4 for debate quality backfill.postgresql://scidex before run: 1 unscored session (quality_score IS NULL OR quality_score = 0) out of 16 total debate sessions.timeout 300 python3 backfill_debate_quality.py.sess_SDA-2026-04-01-gap-001 as 0.0 (low quality placeholder transcript), flagged low quality by evaluator.NULL quality scores = 0; non-NULL quality scores = 16; legacy query (NULL OR 0) still returns 1 because this session is now explicitly scored 0.0.backfill_debate_quality.py: confirmed score is 0.0 (Claude Haiku confirmed no real content)timeout 300 curl -s http://localhost:8000/api/status | python3 -m json.tool returned valid JSON (analyses/hypotheses/edges/gaps counts)./=302, /analyses/=200, /exchange=200, /graph=200, /atlas.html=200, /how.html=301.scidex status shows API and nginx active.backfill_debate_quality.py: re-confirmed 0.0 for placeholder session (Claude Haiku: no real scientific content)/analyses/ = 200 OKtimeout 300 python3 backfill_debate_quality.py: scored 1 session, confirmed 0.0 for placeholder (Claude Haiku: no real scientific content)/=302, /analyses/=200, /exchange=200, /graph=200, /atlas.html=200, /how.html=301python3 backfill_debate_quality.py: scored 1 session, confirmed 0.0 for placeholder (Claude Haiku: no real scientific content)/=302, /analyses/=200, /exchange=200, /graph=200, /atlas.html=200, /how.html=301timeout 300 python3 backfill_debate_quality.py: scored 1 session, confirmed 0.0 for placeholder (Claude Haiku: no real scientific content)/=302, /analyses/200timeout 300 python3 backfill_debate_quality.py: scored 1 session, confirmed 0.0 for placeholder (Claude Haiku: no real scientific content)/=302, /analyses/=200, /exchange=200, /graph=200, /atlas.html=200, /how.html=301timeout 300 python3 backfill_debate_quality.py: no new sessions to score (all 71 already scored)timeout 300 python3 backfill_debate_quality.py: all sessions already scored, nothing to dotimeout 120 python3 backfill_debate_quality.py: "All debate sessions already have quality scores."timeout 300 python3 backfill_debate_quality.py: "All debate sessions already have quality scores."/analyses/ = 200 OKtimeout 300 python3 backfill/backfill_debate_quality.py: "All debate sessions already have quality scores."/analyses/=200, /exchange=200/=302, /exchange=200, /gaps=200, /graph=200, /analyses/=200, /atlas.html=200, /how.html=301backfill_debate_quality.py from archive/oneoff_scripts/ to scripts/ for proper code deliverable/analyses/, /exchange, /gaps, /graph, /atlas.html = 200quality_score = 0 as unscored (reprocesss sessions forever), branch included unrelated Atlas wiki-spec commits, no weak-debate triagebackfill_debate_quality.py:quality_score IS NULL only (0.0 is a legitimate low score)WEAK_SCORE_THRESHOLD = 0.3 detection for already-scored-but-weak debateswiki-citation-governance-spec.md to origin/main state (removed Atlas work log entries)
/=302, /exchange/gaps/graph/analyses/atlas.html = 200scripts/backfill_debate_quality.py instead of fixing the in-use backfill/backfill_debate_quality.py; forge spec work log entries were removedbackfill/backfill_debate_quality.py (the actual in-use script): query now uses quality_score IS NULL only (0.0 is a legitimate low score, not a reprocess trigger)WEAK_SCORE_THRESHOLD = 0.3 and weak-debate triage query to surface already-scored-but-weak debates as RERUN candidatesa88f4944_cb09_forge_reduce_pubmed_metadata_backlog_spec.md (added back removed 3rd/4th execution work log entries)scripts/backfill_debate_quality.py (only backfill/backfill_debate_quality.py is authoritative)
backfill/backfill_debate_quality.py, forge spec restored, weak-debate triage active.backfill/backfill_debate_quality.py (and two other backfill scripts) could not be run directly via python3 backfill/backfill_debate_quality.py — Python added backfill/ to sys.path but not the repo root, causing ModuleNotFoundError: No module named 'db_writes'sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) to:backfill/backfill_debate_quality.pybackfill/backfill_page_exists.pybackfill/backfill_wiki_infoboxes.py
timeout 120 python3 backfill/backfill_debate_quality.py:/=302, /exchange=200, /gaps=200, /graph=200, /analyses/=200, /atlas.html=200, /how.html=301timeout 300 python3 backfill/backfill_debate_quality.py: all sessions already scored, no new unscored sessionstimeout 120 python3 backfill/backfill_debate_quality.py:apply_judge_elo_weight(raw_score, judge_id) to backfill/backfill_debate_quality.pyci-debate-quality-scorer tracks its own Elo via judge_elo.pyweighted = 0.5 + (raw - 0.5) * min(1.0, k_weight) where k_weight is from judge_elo.compute_k_weight(elo)timeout 60 python3 backfill/backfill_debate_quality.py:PRAGMA journal_mode=WAL and sqlite3.Row factory.try/except ValueError for robustness.substring(... from ... for ...)) instead of substr().get_db() from scidex.core.database for PostgreSQL connections.
{
"requirements": {
"analysis": 5,
"safety": 9
},
"completion_shas": [
"2749842acf496b510f47dd161957bae036dc6c0e",
"4a901031704e7cdffc57100eb047c7454cd5f81e",
"fe5cd345365552c4dfea3143cd979fe384f7b856"
],
"completion_shas_checked_at": "2026-04-12T21:27:10.953300+00:00",
"completion_shas_missing": [
"1c95ef66407798d4bca5e416426923072e5c6995",
"9f4e73d51295bc514c4b087ae59397f4742e6e78",
"f2998a9bf22c66053e4f2e1f968bdef82da20a49",
"c8731968c22225156f83a46730a6f39ca01981c5",
"ebf0a81b0582e93218ee924fbfb7c4ad36cbfc90",
"f8e4d3beb9fa91e03783695718100f9e5f31fb0c",
"56bde7aabe5d26c6765c80275eadb61c26594697",
"2afe436df082247e007a98ccfd5ea83c0d487ca9",
"d67005e8b8081b54a433f347bccc7d2e3a28d5ac",
"f44e8b2a6a6db259cb516d7b58f45534ea2bb0f5",
"2af0bd7e3d02f05a9399f25ad134375277a3f26f",
"b34a8ae472daab631df591a14eb85ef63247c303",
"bded46e087dc22fa115c9600593d0cbf8efa37c0",
"2b602d43640566f58c13f7ab47a903da61e55d85",
"7e7cd02795f5355c537679753c19152376a98658",
"421535306fcc8925060bd3d32c24049388ee0a6e",
"3a7fd2313f50d655f47c78ef6fda8492e35f3cfd",
"a097247007db48f356c920e08c0d5d12bb7b1aae",
"3dac301378658d222b5e4b35405b2052c59a3d60",
"028aa740d8adb6cb6e52e4c73cf8d7de7a66e83c",
"3fb7f6154b42103413751c60df8ab4b86ca0dd7d",
"29599c686d187343ae6e0d6b13ab7dc16040e5f7",
"a8189206b2986aa93424d286ab8de90727aa4a02",
"fb6fe4c5c462ccf7b58b8fa3473ac52aa9d5d42a",
"428864b80e1e335076908ba3cf0f9efdc927cc0d",
"1be44bae1d253f7f3f90fdebebc45e2d9297f262",
"ca9df93c740da1c7e72fe47f961f58f46383e3e0",
"0141fb71094f73812050a25adb854521cda069d6",
"e6e4a123696384219436ee0567ddfffe9c22559a",
"a313ed83009afa4b380c04430c7584b3f767c4bf",
"62a7a6d1c638f73b3d2558d2d13e22c43c3270f6",
"3b4be205440b0a1ef217cc87d66ed8efa1befa99"
]
}