> Goal. Design experiments with maximal expected information gain per dollar on the inventions/hypotheses coming out of the other downstream quests. An experiment here is a protocol with expected results, not an execution. Execution happens externally (in-silico via the existing experiment extraction pipeline, in-vitro via partners); this quest produces the protocol and the prediction, validates feasibility, and records results when they return.
>
> Distinct from the existing quest_experiment_extraction_spec.md, which extracts structured records from papers. This quest generates new experiment proposals.
Parent: [scidex_economy_design_spec.md](scidex_economy_design_spec.md).
Extraction counterpart: [quest_experiment_extraction_spec.md](quest_experiment_extraction_spec.md).
Experiment-results loop: [5f27f904_d33_experiment_results_spec.md](5f27f904_d33_experiment_results_spec.md).
---
utility_plan pointer.quest_gaps with tractability and data_readiness scores.artifact_class = "experiment", carrying:protocol (structured — method, sample size, controls, endpoints, duration, cost estimate)predicted_outcome (probabilistic — P(confirm), P(falsify), P(inconclusive))information_gain_bits (expected reduction in hypothesis uncertainty per the prior + predicted outcome)cost_estimate_usdiig_per_dollar = information_gain_bits / cost_estimate_usdtarget_invention_id / target_hypothesis_id it's designed to probefeasibility_score (0-1) from the Senate adversarial pre-checkWhen the experiment executes (externally or in-silico) and results come back, 5f27f904_d33_experiment_results_spec.md closes the loop — comparing predicted to actual, updating Bayesian scores on the target invention/hypothesis.
---
task_type = multi_iter, same framework as quest_inventions, but with:
artifact_class = "experiment"required_roles = ["proposer", "methodologist", "statistician", "critic"]max_iterations = 3debate_rounds = 3 (one less than inventions — experiment protocols are more constrained so converge faster)target_cell = (invention_id OR hypothesis_id) — not a gap cell; experiments are scoped to what they probeacceptance_criteria:iig_per_dollar ≥ class_floorfeasibility_score ≥ 0.5market_bid ≥ median_for_classno_redundant_prior_art (not already covered by a paper the landscape analysis flagged)Priority formula:
P(experiment_slot) = hypothesis_prior_variance(h) × invention_value(i) × landscape_novelty(cell)
× 1/(existing_experiments_for_target + 1)The hypothesis_prior_variance term is key — experiments on hypotheses where the field is evenly split (variance near 0.5²) are maximally informative. Hypotheses where Atlas already has strong consensus (variance low) produce low-information experiments and should be de-prioritized.
Round 1 — Proposal. Proposer agents each draft a protocol. They must populate: method, sample size (with power calc reasoning), control arms, primary endpoint, secondary endpoints, expected duration, rough cost. Multiple proposals per agent encouraged.
Round 2 — Methodology + statistics review. Methodologist and Statistician agents review every proposal. Methodologist flags confounders, missing controls, wrong model systems. Statistician flags underpowered designs, wrong tests, multiple-comparison issues. Each protocol gets a methodology_score and statistics_score (both 0-1). Protocols below 0.5 on either are eliminated.
Round 3 — Critique + synthesis. Critic agent takes surviving protocols and synthesizes one final protocol that incorporates the best elements and addresses all methodology/stats flags. The Critic also produces the predicted_outcome distribution (explicitly, as probabilities summing to 1 over {confirm, falsify, inconclusive}).
The Senate adversarial quest then runs a standardized "would this experiment actually answer the question?" challenge; the outcome is the feasibility_score.
ALL of:
iig_per_dollar ≥ class_floor (the quest maintains a rolling floor across admitted experiments)feasibility_score ≥ 0.5market_bid ≥ median — market participants specifically score the predicted_outcome calibration (a key reason to bid against an experiment is that you think the prediction is mis-calibrated)no_redundant_prior_art — checked against the landscape cell's paper indexWhen the experiment runs (externally or in-silico), results arrive via the experiment_results path in 5f27f904_d33_experiment_results_spec.md. This quest:
predicted_outcome — records calibration (was P(confirm) well-calibrated?).target_invention_id / target_hypothesis_id Bayesian scores.experiment_completed event that the market participants read to settle their bids.Every admitted invention from quest_inventions carries a utility_plan — a pointer to a proposed experiment. That pointer becomes a task in this quest. Invention → experiment proposal → experiment admission → execution → results → invention composite-value update with real utility data.
The utility signal in the parent spec §2 is derived directly from the results of experiments spawned via this mechanism.
quest_inventions — the principal source of experiment proposals via utility_plan.quest_hypotheses (implicit — hypothesis class not a dedicated quest yet) — candidate hypotheses surface via Agora debates; the top-N by prior variance enter this quest's queue.quest_gaps — experiments with low data_readiness gap scores get deprioritized (hard to run).quest_experiment_extraction — complementary; it fills the world model from existing literature, this one proposes new experiments to add to the world model.Adversarial Science — feasibility pre-check in Round 3 above.Exchange + market participants — bid on predicted outcomes; settle on actuals.iig_per_dollar of admitted experiments (rising over time = design quality improving)predicted_outcome vs actual_outcome calibration curveexperiment_results. Still valuable as a proposal artifact even if unrun.)quest_experiment_extraction_spec.md? (Extraction fills Atlas from papers; generation proposes new work. They feed each other but don't block each other.)iig_per_dollar, not on information_gain_bits alone — the dollar denominator discourages padding.)