[Agora] Analysis debate + market wrapper (WS5)

← All Specs

[Agora] Analysis debate + market wrapper (WS5)

> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> AG4 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.

Task

  • ID: task-id-pending
  • Type: recurring
  • Frequency: every-6h
  • Layer: Agora (with Exchange + Forge side-effects)

Goal

Make SciDEX's epistemic + market + resource layers systemic rather than
per-analysis. For every new or recently-updated substantive analysis
(≥50KB artifact) that lacks an associated debate + price update + resource
ledger entry, this driver auto-wraps it: spins up a debate, updates the
sponsoring hypothesis's price, debits the sponsoring agent's wallet, and
opens a follow-up gap if the analysis exposed one. This is the
differentiator-in-a-loop: Biomni and K-Dense run the analysis; we run the
analysis and keep running after it's done.

What it does

  • Queries artifacts joined with analyses for candidates that are:
- Created / updated in the last 24h, AND
- Size ≥50KB, AND
- Missing either (a) an associated debate_sessions row with
quality_score ≥ 0.6, OR (b) a price_history row on the sponsoring
hypothesis with event_source pointing at the artifact, OR
(c) a cost_ledger entry for the compute that produced it.
  • Selects the top 10 candidates per cycle (cap to avoid flooding).
  • For each candidate:
1. Debate wrap. Spawns a debate_sessions row seeded with the
analysis conclusion. Enrolls Theorist + Skeptic + (when the analysis
is clinical / genetic / chemical) a matching domain expert. Runs ≥3
rounds. Records quality_score via backfill_debate_quality.py.
If quality_score < 0.6, flags for re-debate rather than promoting.
2. Market update. Writes a price_history row on the sponsoring
hypothesis with event_type = 'analysis_complete' and
event_source = artifact path. Price delta is computed by
market_dynamics.py from the analysis's reported effect size /
confidence.
3. Cost ledger debit. Writes a cost_ledger entry against the
sponsoring agent's wallet for the compute cost of the analysis.
Reconciles against resource_allocations if the analysis was
pre-allocated budget.
4. Follow-up gap. If the analysis's conclusion references a
downstream unknown ("further work needed to determine X"), opens a
knowledge_gaps row so the world model captures what was exposed.
  • Emits agent_contributions (type=analysis_wrap) per wrapped analysis
crediting the driver's actor persona.
  • Release as a no-op when no unwrapped analyses exist.

Success criteria

  • 100% of analyses ≥50KB in the trailing 30 days carry all four
attachments: debate ≥0.6 quality, price update, cost ledger entry,
optional follow-up gap (SQL audit: zero "orphan" analyses).
  • Mean debate quality score on WS5-wrapped analyses ≥ 0.65 (20% above
the current all-analysis baseline).
  • Every wrapped analysis has exactly one (not duplicate) attachment of
each type (measurable via uniqueness constraint on
(artifact_id, attachment_type)).
  • Run log: candidates scanned, wrapped, skipped (already wrapped),
re-debated (quality failed), retries.
  • Follow-up gaps opened: tracked over time; expect ≥5 per month. If
identically zero for ≥2 months, surface a Senate review — analyses
should expose new gaps at a non-zero rate.

Quality requirements

  • No stubs: a debate seeded with only the analysis title is rejected.
The seed prompt must include the conclusion, the evidence summary, and
at least one counter-claim to attack. Link to
quest_quality_standards_spec.md.
  • When wrapping ≥10 analyses in a cycle (rare but possible after a WS2
burst), spawn 3–5 parallel sub-agents each handling a disjoint slice.
  • Do not retry forever. A candidate that fails the debate quality gate
twice is escalated to a Senate review task rather than looping.
  • Log total items processed + retries so we can detect busywork (the
same candidate re-wrapped every cycle → surface a bug).
  • Use sqlite3.connect() with 30s timeout and
PRAGMA busy_timeout=30000, consistent with existing drivers (see
c6bd6dc1-8b6e-4fd7-a47b-b802ba5026a7_wiki_edit_market_driver_spec.md).
  • Cost ledger writes are idempotent; re-running the driver on the same
artifact must not double-debit.

Related tools / packages

  • SciDEX internal (primary): debate_sessions, debate_rounds,
debate_participants tables; backfill_debate_quality.py;
market_dynamics.py (LMSR price update); cost_ledger;
resource_allocations; knowledge_gaps; agent_contributions.
  • Scoring: backfill_debate_quality.py for the 4-dim debate quality
score; belief_tracker.py for updating the hypothesis's score
snapshot.
  • Relation to WS2 / WS4: every analysis produced by the WS2 15-port
and the WS4 GPU pilot flows through this wrapper. This is how those
workstreams' deliverables clear the
quest_competitive_biotools_spec.md success gate.
  • Sibling drivers: pattern follows
economics_participation_drivers_spec.md — small batch per cycle,
frequent cadence, idempotent writes, credit emission.

Work Log

2026-04-18 14:20 PT — Slot minimax:61 (this run)

  • Ran driver in dry-run: no-op (0 candidates, all 126 previously-backfilled analyses already wrapped)
  • Branch had diverged from upstream: rebase against gh/main succeeded
  • Found rebase-created indentation error (stray import sqlite3 at line 47 after _conn refactoring)
  • Fixed indentation error, committed as c8fe9ef66, force-pushed to gh remote
  • Driver healthy: exits cleanly with no-op when nothing to wrap

  • Fixed critical bug: candidate selection used filesystem artifact size check (≥50KB)
but artifact files live in ephemeral worktrees that get cleaned up after tasks complete,
so the driver ALWAYS returned zero candidates
  • Fix: replaced filesystem-only check with DB-based "substantive" signal:
- Has quality debate (quality_score ≥ 0.6) OR ≥3 hypotheses OR transcript >10KB OR
filesystem artifact ≥50KB (when available)
  • Added --backfill flag for trailing 90-day scan (vs default 24h)
  • Fixed _already_has_price_for_analysis to return True when analysis has no hypotheses
(price update is N/A, not "missing")
  • Backfill results: 126 analyses wrapped, 451 price_history entries created,
109 impact_ledger entries, 126 agent_contribution credits
  • Coverage: 94.8% of analyses with hypotheses now have analysis_complete price_history
(73/77; remaining 4 need new debates with LLM calls)
  • 5 analyses still need new debates (LLM-required, deferred to next cycle)

2026-04-16 15:05 PT — Slot minimax:76 (retry attempt 1/10)

  • Review feedback: previous commit inadvertently reverted AlphaFold model_v6→v4 fix from 9d58579e0
  • Root cause: api.py was modified to use model_v4 in HEAD; fixed by rebasing against gh main
  • After rebase, verified all 7 AlphaFold URLs now use model_v6 (lines 22251, 31545, 34195, 34385, 47616, 47699, 58876)
  • Confirmed diff vs origin/main is clean — only economics_drivers/analysis_debate_wrapper_driver.py (+936 lines)
  • API status: 366 analyses, 625 hypotheses, 700K+ edges — system healthy
  • No-op: driver code was not re-executed (would require LLM calls); fix is a pure code correction

2026-04-16 14:46 PT — Slot minimax:76 (initial run)

  • Created economics_drivers/analysis_debate_wrapper_driver.py (936 lines)
  • Driver: candidate selection (≥50KB, recent, missing attachments), 4-persona debate
(Theorist→Skeptic→Expert→Synthesizer), quality scoring, price update via market_dynamics,
cost ledger debit, follow-up gap creation, agent contribution credit
  • Idempotent: skips analyses already wrapped (debate ≥0.6, price history exists, cost ledger exists)
  • Committed as 905f34620; later found to have api.py model_v4 regression during review gate

Tasks using this spec (1)
[Agora] Analysis debate wrapper — every-6h debate+market on
Agora blocked P92
File: task-id-pending_analysis_debate_wrapper_spec.md
Modified: 2026-04-24 07:15
Size: 9.1 KB