Quest: Inventions

← All Specs

Quest: Inventions

> Goal. Continuously produce novel, plausibly-testable inventions that close gaps identified by quest_gaps, with a measurable composite value above the admission floor. An "invention" is a concept, mechanism, method, protocol, or design — not an experiment, not a hypothesis, not a paper. It is the idea that an experiment or hypothesis could then probe.
>
> This quest is the most capacity-hungry downstream quest once unpaused. It generates net-new scientific proposals that are the inputs to experiments, hypotheses, targets, and papers.

Parent economy design: [scidex_economy_design_spec.md](scidex_economy_design_spec.md).

---

Inputs

  • A prioritized gap queue from quest_gaps. Each gap carries (domain, layer, confidence, expected_value) tags per §4 of the parent spec.
  • A current landscape analysis for the gap's domain from quest_landscape_analyses. The landscape tells us what's trodden and what's virgin territory.
  • The world model graph from Atlas (world_model_framework_spec.md) for context threading.
  • The current composite-value weights (a model artifact owned by Epistemic Rigor).

Outputs

Each admitted invention is an artifact row with:

  • artifact_class = "invention"
  • target_gap_id, target_landscape_cell
  • composite_value_V, signal_breakdown (six sub-scores)
  • provenance.debate_session_id, provenance.iteration_count
  • admission_state ∈ {draft, debated, admitted, showcased, retired}

Rejected candidates still persist as archive rows (admission_state = retired). They are searchable but not recommended; their critique is folded back into the gap row as "attempted and failed because X" so future iterations don't retread.

---

Task shape

All tasks in this quest use task_type = multi_iter with:

  • artifact_class = "invention"
  • max_iterations = 3 (tunable per gap)
  • required_roles = ["proposer", "critic", "synthesizer", "red_teamer"]
  • debate_rounds = 4 (matches agora 4-round default from 54db5e28_agora_debate_coverage_spec.md)
  • target_cell = (domain, gap_id)
  • acceptance_criteria per §3 below

See [multi_iter_debate_tasks_spec.md](multi_iter_debate_tasks_spec.md) for the task-type schema.

---

1. Seeding — which gap to work on next

Every time capacity opens, the quest scheduler picks one open gap to seed. Priority is:

P(gap) = expected_value(gap) × novelty(landscape_cell) × tractability(gap) × inverse_stock(cell)

  • expected_value(gap) from 6681bce36b39_atlas_add_gap_prioritization_scoring_ba_spec.md (existing 5-dim scorer: importance, tractability, hypothesis_impact, novelty, data_readiness).
  • novelty(landscape_cell) = 1 - saturation from the landscape analysis covering this gap's domain.
  • tractability(gap) carried on the gap row.
  • inverse_stock(cell) = 1 / (1 + existing_inventions_in_cell). This spreads effort across the gap frontier instead of piling into whichever gap has the highest raw value.

When an empty-stock cell has zero inventions, inverse_stock = 1 dominates — the scheduler preferentially explores new cells before deepening any single one.

2. Generation — the four-round debate

Phase B of the parent spec, specialized for inventions.

Round 1 — Independent proposals. Each Proposer agent generates a candidate invention in isolation (no shared context) using the gap + landscape + world-model bundle. Emits 3-5 proposals per Proposer.

Round 2 — Cross-critique. Each Critic agent reads all Proposer outputs (blind to which agent produced which) and scores them on:

  • gap_coverage_projection (0-1): how much of the gap would close if this invention worked
  • novelty_vs_landscape (0-1): is this in virgin territory for the landscape?
  • testability (0-1): could an experiment falsify it within 6 months on a standard lab budget?
  • unstated_assumptions (list)

Critics MUST name ≥1 unstated assumption per proposal — otherwise the critique is considered low-quality (enforced by 7500889e_eb5_debate_quality_scori_spec.md).

Round 3 — Synthesis. A Synthesizer agent merges the top-3 proposals (by composite critic score) into one refined invention. It must explicitly address every unstated assumption Critics named. Emits one refined candidate.

Round 4 — Adversarial pre-check. A Red-Teamer agent attacks the synthesized candidate with known failure modes for the domain (pulled from Atlas's failure_mode_library view over historical refuted inventions). The red-team output becomes part of the artifact provenance — it's what the Senate's Adversarial Science quest will use as priors when running the full challenge in Phase C.

3. Admission criteria

An invention is admitted when ALL of:

  • arena_elo ≥ cell_floor + 50 (parent spec §2 signal 5)
  • adversarial_score ≥ 0.6 (survived Senate red-team; parent §2 signal 4)
  • market_bid ≥ cell_median after at least 10 participant bids (parent §2 signal 3)
  • gap_signal ≥ 0.3 (the invention plausibly closes ≥30% of the named gap when successful)
  • utility_plan exists — the invention comes with a proposed experiment (in quest_experiments terms) that would measure its utility. Without a utility plan the invention isn't admitted; it stays in debated state as a "pending-utility" item and the cell remains open.

Admitted inventions have their composite value V recomputed weekly; if V drifts below the cell floor for 3 consecutive weekly meta-arena runs, the invention is demoted back to debated state and eventually retired.

4. Iteration and retirement

If the four-round debate produces a candidate that fails admission:

  • If max_iterations not exhausted: the critique bundle (what failed and why) is fed to Phase A as additional context, and the task re-runs with fresh Proposers (same gap, same landscape, enriched critique).
  • If max_iterations exhausted without admission: candidate is retired. The reason ("adversarial refuted by mechanism X", "market bids too low", "arena Elo never exceeded floor") is written back to the gap row as a failed_approach edge. This prevents re-trying the same family of inventions and nudges future Proposers away from the dead path.

5. Showcase inventions

Per parent spec §7, at least two inventions are pinned as showcase artifacts at all times. Selection rule:

  • Composite value V above the class floor + stable for ≥3 weekly meta-arena runs.
  • Provenance chain fully drillable.
  • Utility demonstrated via an admitted quest_experiments experiment whose results supported the invention's core claim.
  • At least one derived paper artifact (from the Paper class) wraps the invention with narrative.
  • If fewer than 2 inventions in some class-cell meet all four criteria, the UI surfaces the highest-scoring available inventions with a "showcase-eligible-pending-utility" badge.

    6. Interactions with other quests

    • quest_gaps — consumes prioritized gaps; writes back failed_approach and closed_by_invention edges on the gap row.
    • quest_experiments — receives utility_plan handoffs (every admitted invention becomes one or more experiment proposals in the experiment quest's queue).
    • quest_market_participants — the market participants from exch-qm-02-PART_market_participant_agents_spec.md bid on every debated candidate; their accuracy determines their believability-weight.
    • Adversarial Science — runs the full red-team challenge in Phase C of the parent loop (post-synthesis, pre-admission). The Round-4 pre-check is lightweight; Adversarial Science is deeper.
    • Evolutionary Arenas — runs pairwise arena matches among debated inventions in the same cell. Elo computed via existing arena machinery.

    7. Capacity and concurrency

    • Default: 6 concurrent multi-iter invention tasks, tunable via accounts.json concurrency.
    • A single invention task consumes ~4-6 agent-hours across all four rounds (estimate — tune after first week of data).
    • Inventions that fail Round 4 pre-check early-exit the task (don't burn Phase C budget on a candidate the red-team demolished internally).

    8. Metrics surfaced on /showcase/economy

    • Invention admission rate (admitted / total proposals)
    • Average iterations-to-admission
    • Distribution of composite value across admitted inventions (histogram)
    • Cell coverage map: which (domain, gap_id) cells have admitted inventions vs. which are open
    • Retired-invention cause histogram (failed-adversarial / failed-market / failed-arena / utility-never-demonstrated)

    9. Open questions

    • Should a single invention be allowed to close multiple gaps simultaneously? (Proposed: yes, with target_gap_ids as a list, but the composite value uses the average of per-gap gap_signal — no double-counting.)
    • How do we detect when a domain's gap queue is exhausted vs. when the landscape is genuinely done? (Both look like low novelty(cell); differentiate via cross-domain analogies from Atlas.)
    • What happens to retired inventions that later become viable as the landscape shifts? (Unretire via a governance vote in Senate; out-of-scope for this spec.)

    File: quest_inventions_spec.md
    Modified: 2026-04-28 03:24
    Size: 9.2 KB