SciDEX economy — holistic design

> Purpose. Unify invention / experiment / gap / landscape / discovery / hypothesis / paper / target artifacts under one generation-and-valuation pipeline so the agent fleet does measurable, directional work instead of shipping plausibly-helpful-but-unranked output. This is the umbrella spec; each concrete quest spec (quest_inventions, quest_experiments, quest_gaps, quest_landscape_analyses, showcase UI) references this doc for shared mechanics.

The design is motivated by two observations from the 2026-04-24 audit:

The fleet has been generating lots of tasks that do not obviously advance any feature. Many land as "already resolved on main" no-ops, polish tweaks, or CI-watchdog fixes for symptoms of the agent fleet itself. This is the "junk/waste" pattern.

Existing specs cover many pieces (gap generation, gap prioritization, artifact lifecycle, market pricing, agora debates) but aren't explicitly wired into one pipeline. Each quest optimizes locally; nothing globally prioritizes the artifact frontier.

The correction is an economy — artifacts have explicit value signals, the quests compete for capacity, and the output is a ranked set of showcase artifacts with traceable provenance.

---

1. Artifact types and what they are

SciDEX produces seven artifact classes. Each is a discrete node in the world-model graph.

Class	One-line definition	Upstream of	Downstream of
Gap	A specific, actionable deficit in the world model (unknown mechanism, contested claim, uncovered population, missing connection)	experiments, inventions, targets	landscape analyses
Landscape analysis	A living map of a scientific field — clusters, gaps, trends, unknowns	gaps	corpus snapshots, papers
Invention	A novel concept, mechanism, method, or design that plausibly closes a gap	experiments, discoveries, patents	gaps, landscape analyses
Experiment	A testable protocol with an expected information gain	discoveries	hypotheses, inventions, targets
Hypothesis	A falsifiable scientific claim at a defined confidence	experiments, discoveries	debates, gaps
Target	A disease target / pathway / molecule worth investigating	experiments, inventions	gaps, landscape analyses
Discovery	A surprising, reproducible finding	papers, follow-up experiments	experiments, hypotheses
Paper	A composed narrative that bundles discoveries + methods + context	citations, subsequent papers	discoveries, experiments

"Paper" is the public-facing composition of the other six. A showcase paper is the canonical demo of end-to-end value.

---

2. The economy — six signal sources that price every artifact

Every artifact gets a composite value computed from six underlying signals. Each signal is independently produced; the composition is a learned weighted sum whose weights are themselves artifacts (meta-inventions tuned by epistemic rigor).

Gap signal. How well does the artifact close an identified gap? Needs the gap to exist and a projection that links the artifact to it.

- Producer: quest_gaps + Atlas world-model graph edges.
- Range: 0 … 1 (fraction of a named gap closed, upper-bounded at 1).

Landscape signal. Is this trodden ground or genuinely new? Requires a current landscape analysis covering the artifact's domain.

- Producer: quest_landscape_analyses.
- Range: 0 … 1 (1 = no prior art in the mapped literature, 0 = saturated).

Market signal. What do market participants bid when the artifact is listed?

- Producer: Exchange quest + Market Participants quest (existing).
- Range: 0 … ∞ (tokens).

Adversarial signal. Does it survive red-team challenge by the Senate?

- Producer: Adversarial Science quest (existing).
- Range: 0 … 1 (1 = cleanly survived, 0 = refuted).

Evolutionary signal. Does it win in arena tournaments against peer artifacts of the same class?

- Producer: Evolutionary Arenas quest (existing — Elo over pairwise judgments).
- Range: Elo score (1200 baseline, ±400 range).

Utility signal. Does the artifact measurably improve downstream artifacts or benchmarks when used?

- Producer: quest_experiments (runs a planned utility test), Forge benchmarks.
- Range: 0 … ∞ (domain-specific — e.g. % improvement on a benchmark, citations, deployed uses).

Composite value V(artifact) = Σ w_i · normalize(signal_i) where weights are model artifacts owned by Epistemic Rigor. The weights themselves compete on a meta-arena (Elo among weight-vectors based on which vectors best predict long-horizon utility). This is what makes the system self-improving rather than hand-tuned.

V is probability-like — it lives in [0, 1] and measures belief / quality / resolution-likelihood. It does NOT measure how much is at stake. A gap with V=0.9 and a gap with V=0.9 that is a thousand times more important look identical under V alone. §2a fixes that.

---

2a. Size, market cap, volume, liquidity — the other axis

V answers "how valid / probable is this artifact?". It says nothing about magnitude. To distinguish a plausible footnote from a plausible paradigm shift we add three class-calibrated dimensions to every artifact:

2a.1 Size (a.k.a. impact)

S(artifact) ∈ [0, ∞) — a scalar in class-appropriate units that estimates what's at stake if the artifact resolves positively. Size is independent of whether it WILL resolve (that's V's job). Size is a pure upside question.

Per-class definition + units:

Class	`S` unit	How to estimate
Hypothesis	expected-citations-per-year (epy) at the 5-year mark	Regression on existing hypothesis cohort (citation-curve percentile × novelty)
Gap	`fraction_of_world_model_improved` × `domain_weight`	World-model graph reach from the gap (centrality) × operator-set domain weights
Invention	potential applications — `deployments_p10 … deployments_p90` (lognormal)	Analogy to closest N prior inventions in the same landscape cell
Experiment	`expected_information_gain_bits × downstream_artifacts_enabled`	IIG from the existing spec × fan-out over the world model
Discovery	`novelty × reach` (both `[0, 1]`; product in `[0, 1]` — note this class saturates)	Embedding distance to nearest K discoveries × paper-cite fan-out
Paper	projected-citations at 5-year mark (uses hypothesis estimator)	Same estimator as hypothesis; paper's wrapper character gives more signal
Target	`druggability × unmet_medical_need`	Druggability score (existing) × WHO/regulatory/disease-burden numbers
Landscape analysis	`domain_coverage × downstream_gap_rate`	Fraction of domain mapped × gaps-produced-per-refresh

Size is computed once at admission time and recomputed on weekly meta-arena cycles (so S drifts as the field evolves). The estimator for each class is itself a model artifact under Epistemic Rigor; competing estimators face off in a size-meta-arena just like the composite-weight vectors.

2a.2 Market capitalization

MarketCap(artifact) = V(artifact) × S(artifact).

This is the expected impact-weighted value — probability × magnitude. It's the single scalar that answers "which artifacts should get the most agent-capacity?" more faithfully than V alone. The showcase UI's default ranking switches from V to MarketCap once size estimators exist for every class. V remains visible as the confidence component.

Two artifacts with identical V:

paper-class showcase A: V = 0.85, S = 200 epy → MC = 170
paper-class showcase B: V = 0.85, S = 3 epy → MC = 2.55

A dominates B under MarketCap even though they tie on V.

2a.3 Open interest (shares outstanding analogue)

OpenInterest(artifact) = total tokens committed across open market participant positions (sum of stakes on both YES and NO sides). This is the conviction dimension — it measures how much capital the market has bet for or against this artifact. High open interest + low V means "the market strongly disagrees with this artifact" rather than "nobody has paid attention".

OI grows when new participants enter; decays when positions close at resolution. Stored per-artifact in the existing exch-qm-01-MEXT_extend_market_pricing_spec.md market rows.

2a.4 Volume + liquidity

Volume_24h(artifact) = total tokens exchanged in bids/asks in the last 24 hours. Measures attention independent of conviction — an artifact can have high OI with zero recent volume (stable consensus) or low OI with high volume (new + thinly-traded).
Liquidity(artifact) = effective depth — the LMSR-b parameter for this artifact's market × pool-tokens. Proxy for "how much can be bet before the price moves materially". Low-liquidity artifacts' prices are noisy; the scheduler SHOULD NOT treat them as well-calibrated until liquidity exceeds a class floor.

2a.5 Derived rankings (UI + scheduler)

The showcase UI and the quest scheduler consume the following rankings, each answering a different question:

Ranking	Formula	Question it answers
`by_market_cap`	`V × S`	Where should the most agent-capacity go?
`by_size_moonshot`	`S / (V + 0.1)`	Which long-shots carry the most upside if we're wrong about probability?
`by_volume`	`Volume_24h`	What's getting attention right now?
`by_conviction`	`OpenInterest`	What does the market believe strongest in (either direction)?
`by_v_alone`	`V`	The previous default — still useful for confidence-only views.

The default /showcase tab sorts by market_cap; alternate tabs expose the others. The scheduler's Phase A seeding in §3 is switched to market_cap × inverse_stock × capacity_available — previously it was urgency × novelty × capacity, which conflated size and probability.

2a.6 Calibration + drift

Each size estimator is tested once per week against realized outcomes (paper citations actually accrued, experiments whose IIG was measurable in hindsight, etc.). An estimator whose S-predictions diverge ≥2σ from realizations across a rolling window gets deprecated and the second-place estimator in its meta-arena gets promoted. This is the same self-improving pattern as the composite-weight vector in §2.

2a.7 Anti-gaming

Four guardrails keep the market-cap axis from being manipulable:

Size estimator is owned by Epistemic Rigor, not the artifact's originator. You can't inflate your own artifact's size field.

OpenInterest is weighted by participant believability (existing exch-qm-02-PART — market participant accuracy track). A novice's shares count less than a proven-accurate participant's.

Volume has a spam floor — wash trading by the same participant within 10 minutes is deduplicated into one "intent" position before being added to Volume_24h.

Size decays on repeated non-resolution. If an artifact sits at V < 0.3 for 4 consecutive weekly windows without new evidence, S is damped by 0.8 per window; prevents forever-unresolved grandiose claims from hoarding capacity.

2a.8 Where the fields live

S + MarketCap + OpenInterest + Volume_24h + Liquidity → new columns on the artifact row (or JSON within payload_json for artifact classes that don't have dedicated tables yet).
LMSR market rows (per the existing exch-qm-01-MEXT_extend_market_pricing_spec.md) now carry open_interest, volume_24h, liquidity_b, and the new size_estimate + market_cap fields derived from the linked artifact.
Size-estimator artifacts live under artifact_class = "size_estimator" (a sub-class of invention — they're literally inventions about measurement) so they get their own market pricing / meta-arena loop.

---

3. The generation loop

Every artifact is produced by one four-phase loop. The phases are the same regardless of artifact class; the inputs and acceptance criteria differ.

Phase A — SEEDING
  Select (gap, landscape cell) pair to work on.
  Priority = urgency_from_gap × novelty_from_landscape × capacity_available
  Emits: proposal prompt + context bundle.

Phase B — MULTI-AGENT DEBATE
  N agents with differentiated roles: Proposer, Critic, Synthesizer, Red-Teamer.
  Constrained rounds (4 default; see agora_debate_coverage specs).
  World-model context (Atlas) threaded into every round.
  Emits: candidate artifact + debate transcript + confidence.

Phase C — ADVERSARIAL + MARKET
  Senate red-team runs standardized challenges against the candidate.
  Market participants bid on composite value.
  Arena tournament if there are ≥2 candidates in the same cell.
  Emits: adversarial_score, market_bid, arena_elo.

Phase D — ITERATE OR RETIRE
  If V(artifact) exceeds the cell's current floor, it replaces the
  incumbent and becomes the new floor.
  If it's within the retry budget and below floor, feed the critique
  back into Phase A for a second iteration.
  If all budget burned and still below floor, retire to the archive
  (still indexed, still citable).

The loop explicitly requires multiple agents and multiple iterations before an artifact is admitted. Tasks generated for this loop use task_type=multi_iter with fields max_iterations, required_participants, debate_rounds — see multi_iter_debate_tasks_spec.md (new).

---

4. Quest choreography

The quests compose as follows; each arrow represents data or task-generation flowing downstream.

quest_landscape_analyses
         ↓  (cells + empty regions)
       quest_gaps
         ↓  (gap queue, prioritized)
   ┌─────┴──────┐
   ↓            ↓
 quest_         quest_
 inventions    experiments
   ↓            ↓
   └──► Artifact ◄─── market_participants (bid)
            │      ◄─── adversarial_science (red-team)
            │      ◄─── evolutionary_arenas (Elo)
            ↓
     composite value V
            ↓
     showcase / retire

quest_gaps is the funnel. Gap quality is the most important input quality gate in the system — garbage-in yields garbage artifacts. The existing quest_gap_factory, gap_quality_scoring, gap_priority_debate_tasks, gap_governance_review_tasks, gap_prediction_markets specs are all load-bearing and stay; this spec adds the wiring that says gaps MUST be tagged with (domain, layer, confidence, expected_value) before they're dequeued by a downstream quest.

quest_landscape_analyses is new. It scans corpora by field (Atlas literature index) and emits a living map. It is the ONLY gap-source that is allowed to manufacture truly novel gaps — the other gap-generators (debate-triggered, watchdog-triggered) reinforce existing gaps rather than discovering new territory. Without landscape, the system pattern-matches on what it already knows.

quest_inventions and quest_experiments are new downstream quests. They are the most capacity-hungry and get the most agent slots once unpaused.

---

5. Task shape — multi-iteration, multi-agent

CI-style "one-shot script runs once" tasks are banned from the four downstream quests. Their task rows carry:

task_type = multi_iter
max_iterations (default 3)
required_roles = ["proposer", "critic", "synthesizer", "red_teamer"]
artifact_class one of {invention, experiment, hypothesis, target, discovery, paper, gap, landscape}
target_cell = (domain, gap_id) the task is working in
acceptance_criteria = list of measurable thresholds (arena Elo ≥ baseline+50, adversarial ≥ 0.6, market bid ≥ median, etc.)

Tasks run until any of:

all acceptance criteria met → artifact admitted
max_iterations reached → retire-and-archive
cell superseded by another task's winner → abandon cleanly

This replaces the current one_shot default where a worker runs once, produces whatever, and closes. The new shape forces convergence.

---

6. Task generation — only one generator

Per the 2026-04-24 directive: only quest task generation runs as a CI/cron job. All other recurring task-generators (watchdog auto-repair, CI checks, broken-link scanners, CI self-maintenance, stub audit) are paused. They can be re-enabled later; not now.

The sole survivor is quest_engine.py (or equivalent) which:

polls the gap queue from quest_gaps
assigns open gaps to quest_inventions / quest_experiments based on gap tag
creates one multi_iter task per gap-cell per capacity slot
re-prioritizes tasks by V(expected) whenever the composite-value model changes

Everything else stops generating tasks. Existing tasks that are in flight continue; new noise doesn't get created.

---

7. Showcase artifacts — what they are, how they surface

For each of the seven artifact classes, the system maintains ≥2 showcase artifacts at all times. A showcase artifact has:

composite value V above the floor for that class, stable across ≥3 consecutive weekly meta-arena runs
complete provenance chain: gap → landscape cell → debate transcript → adversarial outcome → market bids → arena matchups → composite score
a narrative wrapper (a paper artifact, short form — see papers class) explaining why a non-expert should care
a utility demonstration — a concrete application where using this artifact improved some measurable thing

Showcase artifacts are the public face of SciDEX. They are pinned in the UI. Their provenance chain is fully drillable — clicking the invention opens the debate that produced it, the gap it closes, the landscape cell it occupies, and the arena history that ranked it there.

7.1 Model artifacts

A subset of showcase artifacts are model artifacts — ones whose utility is so clearly measurable that we mint them as references. The weight-vector for the composite-value function is one such model artifact. So are: the best-of-class invention that closed the biggest-impact gap, the experiment whose information-gain-per-dollar is highest, the landscape analysis most cited by other quests. Model artifacts get their own badge in the UI and are exempt from retirement as long as their signals hold.

---

8. UI — showcase surface

Scope for the companion spec showcase_artifact_ui_spec.md (new). Summary here for context:

Top-nav tab /showcase with one tab per artifact class plus a cross-class "Model artifacts" tab.
Each tab renders a card grid of the class's showcase artifacts. Card shows: name, one-line value prop, composite score V with bar, an icon strip for the six signals (gap / landscape / market / adversarial / arena / utility) each green/amber/red.
Detail view: full provenance chain as a vertical timeline with drill-ins (gap card → landscape map → debate transcript with round-by-round role attribution → adversarial challenges with pass/fail → market bid history → arena matchups table).
Cross-class "economy" dashboard at /showcase/economy: plots the weight-vector artifact's current weights, the floor values by class, the top-N rising artifacts per class, and the list of open gaps that have no artifact addressing them yet (prioritized).

---

9. What changes in existing specs

This spec references the existing specs; it does not replace them. Specific integration points:

exch-qm-03-LIFE_artifact_lifecycle_spec.md — artifact states (draft/debate/admitted/showcased/retired) align with this spec's Phase A-D outputs.
exch-qm-01-MEXT_extend_market_pricing_spec.md — the market signal in §2 feeds that pricing implementation.
quest_gap_factory_spec.md + siblings — the quest_gaps funnel is the union of those specs; no new spec needed there, just the task-shape requirements (gap must carry (domain, layer, confidence, expected_value) before it leaves the funnel).
q-ai-tools-landscape_spec.md — existing landscape spec is scoped to AI tools; quest_landscape_analyses generalizes the pattern to all scientific fields. That spec becomes a specialized case.
agora_debate_coverage + debate_quality_scoring specs — the Phase B loop in §3 uses these directly.
evolutionary_arenas quest — the arena signal in §2 is what that quest already produces.

---

10. Milestones (first 30 days after unpause)

Week 1. quest_landscape_analyses emits its first 3 landscape analyses (molecular biology, neuroscience, clinical genetics). Each surfaces ≥10 tagged gaps into quest_gaps.

Week 2. quest_inventions + quest_experiments each run ≥20 multi-iter tasks against the surfaced gaps. At least 2 artifacts per class cross the admission floor.

Week 3. Showcase UI /showcase lands with ≥2 showcase artifacts per class, provenance chain drillable. Composite-value weight-vector artifact pinned as a model artifact.

Week 4. Meta-arena over weight-vectors runs first round; the winning weight-vector replaces the seed vector. Gap queue is re-prioritized under new weights. First cycle of self-improvement demonstrated.

If any milestone slips, the retrospective output is itself a landscape analysis + gap in the agent_ecosystem quest — the system uses its own machinery to improve itself.

---

11. Open questions (to resolve in child specs)

How is the Phase A seeding priority score computed in practice, and where does it live? (Likely a view over tasks + gaps + artifacts.)
What stops a market participant from front-running the composite-value computation? (Bond-weighted bids, delayed reveal, or both — belongs in quest_market_participants_spec_v2.md.)
How does retirement interact with existing citations? Retired artifacts keep URLs; they stop being recommended.
Should Forge benchmarks be their own artifact class or a sub-class of target? (Proposed: sub-class. Revisit if benchmark valuation diverges meaningfully from target valuation.)
What's the CI-task whitelist look like as a concrete list? (See §6 — to be populated by the follow-up task-triage survey.)

Child specs will drop each of these.

Tasks using this spec (1)

[Cross-cutting] Wire existing K-Dense skills into analyses +

running P75

File: scidex_economy_design_spec.md

Modified: 2026-04-24 08:16

Size: 21.9 KB