> Purpose. Unify invention / experiment / gap / landscape / discovery / hypothesis / paper / target artifacts under one generation-and-valuation pipeline so the agent fleet does measurable, directional work instead of shipping plausibly-helpful-but-unranked output. This is the umbrella spec; each concrete quest spec (quest_inventions, quest_experiments, quest_gaps, quest_landscape_analyses, showcase UI) references this doc for shared mechanics.
The design is motivated by two observations from the 2026-04-24 audit:
The correction is an economy — artifacts have explicit value signals, the quests compete for capacity, and the output is a ranked set of showcase artifacts with traceable provenance.
---
SciDEX produces seven artifact classes. Each is a discrete node in the world-model graph.
"Paper" is the public-facing composition of the other six. A showcase paper is the canonical demo of end-to-end value.---
Every artifact gets a composite value computed from six underlying signals. Each signal is independently produced; the composition is a learned weighted sum whose weights are themselves artifacts (meta-inventions tuned by epistemic rigor).
quest_gaps + Atlas world-model graph edges.quest_landscape_analyses.Exchange quest + Market Participants quest (existing).Adversarial Science quest (existing).Evolutionary Arenas quest (existing — Elo over pairwise judgments).quest_experiments (runs a planned utility test), Forge benchmarks.Composite value V(artifact) = Σ w_i · normalize(signal_i) where weights are model artifacts owned by Epistemic Rigor. The weights themselves compete on a meta-arena (Elo among weight-vectors based on which vectors best predict long-horizon utility). This is what makes the system self-improving rather than hand-tuned.
V is probability-like — it lives in [0, 1] and measures belief / quality / resolution-likelihood. It does NOT measure how much is at stake. A gap with V=0.9 and a gap with V=0.9 that is a thousand times more important look identical under V alone. §2a fixes that.
---
V answers "how valid / probable is this artifact?". It says nothing about magnitude. To distinguish a plausible footnote from a plausible paradigm shift we add three class-calibrated dimensions to every artifact:
S(artifact) ∈ [0, ∞) — a scalar in class-appropriate units that estimates what's at stake if the artifact resolves positively. Size is independent of whether it WILL resolve (that's V's job). Size is a pure upside question.
Per-class definition + units:
Size is computed once at admission time and recomputed on weekly meta-arena cycles (soS drifts as the field evolves). The estimator for each class is itself a model artifact under Epistemic Rigor; competing estimators face off in a size-meta-arena just like the composite-weight vectors.MarketCap(artifact) = V(artifact) × S(artifact).
This is the expected impact-weighted value — probability × magnitude. It's the single scalar that answers "which artifacts should get the most agent-capacity?" more faithfully than V alone. The showcase UI's default ranking switches from V to MarketCap once size estimators exist for every class. V remains visible as the confidence component.
Two artifacts with identical V:
V = 0.85, S = 200 epy → MC = 170V = 0.85, S = 3 epy → MC = 2.55MarketCap even though they tie on V.OpenInterest(artifact) = total tokens committed across open market participant positions (sum of stakes on both YES and NO sides). This is the conviction dimension — it measures how much capital the market has bet for or against this artifact. High open interest + low V means "the market strongly disagrees with this artifact" rather than "nobody has paid attention".
OI grows when new participants enter; decays when positions close at resolution. Stored per-artifact in the existing exch-qm-01-MEXT_extend_market_pricing_spec.md market rows.
Volume_24h(artifact) = total tokens exchanged in bids/asks in the last 24 hours. Measures attention independent of conviction — an artifact can have high OI with zero recent volume (stable consensus) or low OI with high volume (new + thinly-traded).Liquidity(artifact) = effective depth — the LMSR-b parameter for this artifact's market × pool-tokens. Proxy for "how much can be bet before the price moves materially". Low-liquidity artifacts' prices are noisy; the scheduler SHOULD NOT treat them as well-calibrated until liquidity exceeds a class floor.The showcase UI and the quest scheduler consume the following rankings, each answering a different question:
The default/showcase tab sorts by market_cap; alternate tabs expose the others. The scheduler's Phase A seeding in §3 is switched to market_cap × inverse_stock × capacity_available — previously it was urgency × novelty × capacity, which conflated size and probability.Each size estimator is tested once per week against realized outcomes (paper citations actually accrued, experiments whose IIG was measurable in hindsight, etc.). An estimator whose S-predictions diverge ≥2σ from realizations across a rolling window gets deprecated and the second-place estimator in its meta-arena gets promoted. This is the same self-improving pattern as the composite-weight vector in §2.
Four guardrails keep the market-cap axis from being manipulable:
Epistemic Rigor, not the artifact's originator. You can't inflate your own artifact's size field.OpenInterest is weighted by participant believability (existing exch-qm-02-PART — market participant accuracy track). A novice's shares count less than a proven-accurate participant's.Volume_24h.V < 0.3 for 4 consecutive weekly windows without new evidence, S is damped by 0.8 per window; prevents forever-unresolved grandiose claims from hoarding capacity.S + MarketCap + OpenInterest + Volume_24h + Liquidity → new columns on the artifact row (or JSON within payload_json for artifact classes that don't have dedicated tables yet).exch-qm-01-MEXT_extend_market_pricing_spec.md) now carry open_interest, volume_24h, liquidity_b, and the new size_estimate + market_cap fields derived from the linked artifact.artifact_class = "size_estimator" (a sub-class of invention — they're literally inventions about measurement) so they get their own market pricing / meta-arena loop.Every artifact is produced by one four-phase loop. The phases are the same regardless of artifact class; the inputs and acceptance criteria differ.
Phase A — SEEDING
Select (gap, landscape cell) pair to work on.
Priority = urgency_from_gap × novelty_from_landscape × capacity_available
Emits: proposal prompt + context bundle.
Phase B — MULTI-AGENT DEBATE
N agents with differentiated roles: Proposer, Critic, Synthesizer, Red-Teamer.
Constrained rounds (4 default; see agora_debate_coverage specs).
World-model context (Atlas) threaded into every round.
Emits: candidate artifact + debate transcript + confidence.
Phase C — ADVERSARIAL + MARKET
Senate red-team runs standardized challenges against the candidate.
Market participants bid on composite value.
Arena tournament if there are ≥2 candidates in the same cell.
Emits: adversarial_score, market_bid, arena_elo.
Phase D — ITERATE OR RETIRE
If V(artifact) exceeds the cell's current floor, it replaces the
incumbent and becomes the new floor.
If it's within the retry budget and below floor, feed the critique
back into Phase A for a second iteration.
If all budget burned and still below floor, retire to the archive
(still indexed, still citable).The loop explicitly requires multiple agents and multiple iterations before an artifact is admitted. Tasks generated for this loop use task_type=multi_iter with fields max_iterations, required_participants, debate_rounds — see multi_iter_debate_tasks_spec.md (new).
---
The quests compose as follows; each arrow represents data or task-generation flowing downstream.
quest_landscape_analyses
↓ (cells + empty regions)
quest_gaps
↓ (gap queue, prioritized)
┌─────┴──────┐
↓ ↓
quest_ quest_
inventions experiments
↓ ↓
└──► Artifact ◄─── market_participants (bid)
│ ◄─── adversarial_science (red-team)
│ ◄─── evolutionary_arenas (Elo)
↓
composite value V
↓
showcase / retirequest_gaps is the funnel. Gap quality is the most important input quality gate in the system — garbage-in yields garbage artifacts. The existing quest_gap_factory, gap_quality_scoring, gap_priority_debate_tasks, gap_governance_review_tasks, gap_prediction_markets specs are all load-bearing and stay; this spec adds the wiring that says gaps MUST be tagged with (domain, layer, confidence, expected_value) before they're dequeued by a downstream quest.
quest_landscape_analyses is new. It scans corpora by field (Atlas literature index) and emits a living map. It is the ONLY gap-source that is allowed to manufacture truly novel gaps — the other gap-generators (debate-triggered, watchdog-triggered) reinforce existing gaps rather than discovering new territory. Without landscape, the system pattern-matches on what it already knows.
quest_inventions and quest_experiments are new downstream quests. They are the most capacity-hungry and get the most agent slots once unpaused.
---
CI-style "one-shot script runs once" tasks are banned from the four downstream quests. Their task rows carry:
task_type = multi_itermax_iterations (default 3)required_roles = ["proposer", "critic", "synthesizer", "red_teamer"]artifact_class one of {invention, experiment, hypothesis, target, discovery, paper, gap, landscape}target_cell = (domain, gap_id) the task is working inacceptance_criteria = list of measurable thresholds (arena Elo ≥ baseline+50, adversarial ≥ 0.6, market bid ≥ median, etc.)max_iterations reached → retire-and-archiveone_shot default where a worker runs once, produces whatever, and closes. The new shape forces convergence.---
Per the 2026-04-24 directive: only quest task generation runs as a CI/cron job. All other recurring task-generators (watchdog auto-repair, CI checks, broken-link scanners, CI self-maintenance, stub audit) are paused. They can be re-enabled later; not now.
The sole survivor is quest_engine.py (or equivalent) which:
quest_gapsquest_inventions / quest_experiments based on gap tagmulti_iter task per gap-cell per capacity slotV(expected) whenever the composite-value model changes---
For each of the seven artifact classes, the system maintains ≥2 showcase artifacts at all times. A showcase artifact has:
V above the floor for that class, stable across ≥3 consecutive weekly meta-arena runspapers class) explaining why a non-expert should careA subset of showcase artifacts are model artifacts — ones whose utility is so clearly measurable that we mint them as references. The weight-vector for the composite-value function is one such model artifact. So are: the best-of-class invention that closed the biggest-impact gap, the experiment whose information-gain-per-dollar is highest, the landscape analysis most cited by other quests. Model artifacts get their own badge in the UI and are exempt from retirement as long as their signals hold.
---
Scope for the companion spec showcase_artifact_ui_spec.md (new). Summary here for context:
/showcase with one tab per artifact class plus a cross-class "Model artifacts" tab.V with bar, an icon strip for the six signals (gap / landscape / market / adversarial / arena / utility) each green/amber/red./showcase/economy: plots the weight-vector artifact's current weights, the floor values by class, the top-N rising artifacts per class, and the list of open gaps that have no artifact addressing them yet (prioritized).This spec references the existing specs; it does not replace them. Specific integration points:
exch-qm-03-LIFE_artifact_lifecycle_spec.md — artifact states (draft/debate/admitted/showcased/retired) align with this spec's Phase A-D outputs.exch-qm-01-MEXT_extend_market_pricing_spec.md — the market signal in §2 feeds that pricing implementation.quest_gap_factory_spec.md + siblings — the quest_gaps funnel is the union of those specs; no new spec needed there, just the task-shape requirements (gap must carry (domain, layer, confidence, expected_value) before it leaves the funnel).q-ai-tools-landscape_spec.md — existing landscape spec is scoped to AI tools; quest_landscape_analyses generalizes the pattern to all scientific fields. That spec becomes a specialized case.agora_debate_coverage + debate_quality_scoring specs — the Phase B loop in §3 uses these directly.evolutionary_arenas quest — the arena signal in §2 is what that quest already produces.quest_landscape_analyses emits its first 3 landscape analyses (molecular biology, neuroscience, clinical genetics). Each surfaces ≥10 tagged gaps into quest_gaps.quest_inventions + quest_experiments each run ≥20 multi-iter tasks against the surfaced gaps. At least 2 artifacts per class cross the admission floor./showcase lands with ≥2 showcase artifacts per class, provenance chain drillable. Composite-value weight-vector artifact pinned as a model artifact.If any milestone slips, the retrospective output is itself a landscape analysis + gap in the agent_ecosystem quest — the system uses its own machinery to improve itself.
---
priority score computed in practice, and where does it live? (Likely a view over tasks + gaps + artifacts.)quest_market_participants_spec_v2.md.)target? (Proposed: sub-class. Revisit if benchmark valuation diverges meaningfully from target valuation.)