[Agora] Analysis debate + market wrapper (WS5)

> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> AG4 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.

Task

ID: task-id-pending
Type: recurring
Frequency: every-6h
Layer: Agora (with Exchange + Forge side-effects)

Goal

Make SciDEX's epistemic + market + resource layers systemic rather than
per-analysis. For every new or recently-updated substantive analysis
(≥50KB artifact) that lacks an associated debate + price update + resource
ledger entry, this driver auto-wraps it: spins up a debate, updates the
sponsoring hypothesis's price, debits the sponsoring agent's wallet, and
opens a follow-up gap if the analysis exposed one. This is the
differentiator-in-a-loop: Biomni and K-Dense run the analysis; we run the
analysis and keep running after it's done.

What it does

Queries artifacts joined with analyses for candidates that are:

- Created / updated in the last 24h, AND
- Size ≥50KB, AND
- Missing either (a) an associated debate_sessions row with
quality_score ≥ 0.6, OR (b) a price_history row on the sponsoring
hypothesis with event_source pointing at the artifact, OR
(c) a cost_ledger entry for the compute that produced it.

Selects the top 10 candidates per cycle (cap to avoid flooding).
For each candidate:

1. Debate wrap. Spawns a debate_sessions row seeded with the
analysis conclusion. Enrolls Theorist + Skeptic + (when the analysis
is clinical / genetic / chemical) a matching domain expert. Runs ≥3
rounds. Records quality_score via backfill_debate_quality.py.
If quality_score < 0.6, flags for re-debate rather than promoting.
2. Market update. Writes a price_history row on the sponsoring
hypothesis with event_type = 'analysis_complete' and
event_source = artifact path. Price delta is computed by
market_dynamics.py from the analysis's reported effect size /
confidence.
3. Cost ledger debit. Writes a cost_ledger entry against the
sponsoring agent's wallet for the compute cost of the analysis.
Reconciles against resource_allocations if the analysis was
pre-allocated budget.
4. Follow-up gap. If the analysis's conclusion references a
downstream unknown ("further work needed to determine X"), opens a
knowledge_gaps row so the world model captures what was exposed.

Emits agent_contributions (type=analysis_wrap) per wrapped analysis

crediting the driver's actor persona.

Release as a no-op when no unwrapped analyses exist.

Success criteria

100% of analyses ≥50KB in the trailing 30 days carry all four

attachments: debate ≥0.6 quality, price update, cost ledger entry,
optional follow-up gap (SQL audit: zero "orphan" analyses).

Mean debate quality score on WS5-wrapped analyses ≥ 0.65 (20% above

the current all-analysis baseline).

Every wrapped analysis has exactly one (not duplicate) attachment of

each type (measurable via uniqueness constraint on
(artifact_id, attachment_type)).

Run log: candidates scanned, wrapped, skipped (already wrapped),

re-debated (quality failed), retries.

Follow-up gaps opened: tracked over time; expect ≥5 per month. If

identically zero for ≥2 months, surface a Senate review — analyses
should expose new gaps at a non-zero rate.

Quality requirements

No stubs: a debate seeded with only the analysis title is rejected.

The seed prompt must include the conclusion, the evidence summary, and
at least one counter-claim to attack. Link to
quest_quality_standards_spec.md.

When wrapping ≥10 analyses in a cycle (rare but possible after a WS2

burst), spawn 3–5 parallel sub-agents each handling a disjoint slice.

Do not retry forever. A candidate that fails the debate quality gate

twice is escalated to a Senate review task rather than looping.

Log total items processed + retries so we can detect busywork (the

same candidate re-wrapped every cycle → surface a bug).

Use sqlite3.connect() with 30s timeout and

PRAGMA busy_timeout=30000, consistent with existing drivers (see
c6bd6dc1-8b6e-4fd7-a47b-b802ba5026a7_wiki_edit_market_driver_spec.md).

Cost ledger writes are idempotent; re-running the driver on the same

artifact must not double-debit.

Related tools / packages

SciDEX internal (primary): debate_sessions, debate_rounds,

debate_participants tables; backfill_debate_quality.py;
market_dynamics.py (LMSR price update); cost_ledger;
resource_allocations; knowledge_gaps; agent_contributions.

Scoring: backfill_debate_quality.py for the 4-dim debate quality

score; belief_tracker.py for updating the hypothesis's score
snapshot.

Relation to WS2 / WS4: every analysis produced by the WS2 15-port

and the WS4 GPU pilot flows through this wrapper. This is how those
workstreams' deliverables clear the
quest_competitive_biotools_spec.md success gate.

Sibling drivers: pattern follows

economics_participation_drivers_spec.md — small batch per cycle,
frequent cadence, idempotent writes, credit emission.

Work Log

2026-04-18 14:20 PT — Slot minimax:61 (this run)

Ran driver in dry-run: no-op (0 candidates, all 126 previously-backfilled analyses already wrapped)
Branch had diverged from upstream: rebase against gh/main succeeded
Found rebase-created indentation error (stray import sqlite3 at line 47 after _conn refactoring)
Fixed indentation error, committed as c8fe9ef66, force-pushed to gh remote
Driver healthy: exits cleanly with no-op when nothing to wrap

Fixed critical bug: candidate selection used filesystem artifact size check (≥50KB)

but artifact files live in ephemeral worktrees that get cleaned up after tasks complete,
so the driver ALWAYS returned zero candidates

Fix: replaced filesystem-only check with DB-based "substantive" signal:

- Has quality debate (quality_score ≥ 0.6) OR ≥3 hypotheses OR transcript >10KB OR
filesystem artifact ≥50KB (when available)

Added --backfill flag for trailing 90-day scan (vs default 24h)
Fixed _already_has_price_for_analysis to return True when analysis has no hypotheses

(price update is N/A, not "missing")

Backfill results: 126 analyses wrapped, 451 price_history entries created,

109 impact_ledger entries, 126 agent_contribution credits

Coverage: 94.8% of analyses with hypotheses now have analysis_complete price_history

(73/77; remaining 4 need new debates with LLM calls)

5 analyses still need new debates (LLM-required, deferred to next cycle)

2026-04-16 15:05 PT — Slot minimax:76 (retry attempt 1/10)

Review feedback: previous commit inadvertently reverted AlphaFold model_v6→v4 fix from 9d58579e0
Root cause: api.py was modified to use model_v4 in HEAD; fixed by rebasing against gh main
After rebase, verified all 7 AlphaFold URLs now use model_v6 (lines 22251, 31545, 34195, 34385, 47616, 47699, 58876)
Confirmed diff vs origin/main is clean — only economics_drivers/analysis_debate_wrapper_driver.py (+936 lines)
API status: 366 analyses, 625 hypotheses, 700K+ edges — system healthy
No-op: driver code was not re-executed (would require LLM calls); fix is a pure code correction

2026-04-16 14:46 PT — Slot minimax:76 (initial run)

Created economics_drivers/analysis_debate_wrapper_driver.py (936 lines)
Driver: candidate selection (≥50KB, recent, missing attachments), 4-persona debate

(Theorist→Skeptic→Expert→Synthesizer), quality scoring, price update via market_dynamics,
cost ledger debit, follow-up gap creation, agent contribution credit

Idempotent: skips analyses already wrapped (debate ≥0.6, price history exists, cost ledger exists)
Committed as 905f34620; later found to have api.py model_v4 regression during review gate

Tasks using this spec (1)

[Agora] Analysis debate wrapper — every-6h debate+market on

Agora blocked P92

File: task-id-pending_analysis_debate_wrapper_spec.md

Modified: 2026-04-24 07:15

Size: 9.1 KB