Close the loop: SciDEX proposes falsifiable in-silico experiments
(via quest_experiments_generation_spec.md + quest_inventions_spec.md),
and an agent — operating as a participant in the SciDEX economy — claims
high-value, feasible ones, executes them in a sandbox, commits artifacts,
records results, and earns tokens. The system observes its own debate /
evidence-percolation / market-settlement loop end-to-end with real data
flowing through it.
This is core to SciDEX's reason-to-exist: a machine that prioritizes,
funds, executes, debates, and rewards scientific work, not just one that
generates proposals.
> ## Continuous-process anchor
>
> Two recurring sub-processes:
> 1. Claim driver — find high-value claimable experiments, route to
> capable agents, write claim rows (gap-predicate, bounded batch)
> 2. Result percolation driver — when an execution finishes, push
> results into hypothesis Bayesian update + market settlement +
> debate enrollment
>
> Execution itself is performed inside iterative tasks per claim — not
> a recurring driver, but a one-shot iterative artifact-producing task.
Today, SciDEX generates experiment proposals (788+ active experiments
per quest_experiment_extraction_spec.md) but very few are actually run
through SciDEX itself. Most "execution" is human researchers reading
proposals and running experiments offline, with results never flowing
back into the system.
That breaks the compounding-value thesis. Every executed-and-validated
experiment should:
Only in-silico, on-VM-feasible experiments for now (later extensions
may include cloud GPU and physical lab via Ginkgo / OpenTrons / Adaptyv).
Eligibility predicate:
SELECT * FROM artifacts
WHERE artifact_type = 'experiment'
AND metadata->>'feasibility_score' >= '0.6'
AND metadata->>'iig_per_dollar' >= (SELECT current_floor FROM iig_config)
AND metadata->>'execution_mode' = 'in_silico'
AND metadata->>'cost_estimate_usd' <= 5.00 -- conservative
AND id NOT IN (SELECT experiment_artifact_id FROM experiment_claims
WHERE status IN ('claimed', 'running', 'completed'))
AND qc_status = 'passed' -- must be vetted
ORDER BY metadata->>'iig_per_dollar' DESC
LIMIT 20;Out of scope (this quest): wet-lab, animal model, clinical trials,
cloud-only HPC. Those need additional infrastructure
(quest_analysis_sandboxing_spec.md extensions).
A new actor row, registered once at quest bootstrap:
INSERT INTO actors (id, actor_type, display_name, permissions, capabilities)
VALUES (
'agent-experiment-executor-001',
'ai_local',
'Experiment Executor (default)',
'contributor',
'{"executes_in_silico": true, "max_runtime_seconds": 1800,
"available_tools": ["scanpy","pydeseq2","biopython","reactome",
"string","gtex","alphafold","..."]}'::jsonb
);
INSERT INTO token_accounts (account_id, balance, total_earned, total_spent)
VALUES ('agent-experiment-executor-001', 1000, 0, 0);The agent has its own ledger account (1000-token initial endowment per
quest_capital_markets_spec.md). It earns tokens for successful work
and could spend them on prioritizing certain experiments later.
Multiple executor instances can be registered later (specialized:
"Executor (genomics)", "Executor (proteomics)"). v1 is one generalist.
[Forge] CI: Experiment claim driver (every-2h):experiment_claims row withstatus='claimed', soft-lock 24h
expired, freed forCREATE TABLE experiment_claims (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
experiment_artifact_id UUID NOT NULL REFERENCES artifacts(id),
claimant_actor_id TEXT NOT NULL REFERENCES actors(id),
status TEXT NOT NULL CHECK (status IN
('claimed','running','completed','failed','expired','cancelled')),
claimed_at TIMESTAMPTZ DEFAULT NOW(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ DEFAULT (NOW() + interval '24 hours'),
result_artifact_id TEXT REFERENCES artifacts(id),
failure_reason TEXT
);
CREATE INDEX idx_experiment_claims_status ON experiment_claims(status);
CREATE UNIQUE INDEX idx_experiment_claims_active
ON experiment_claims(experiment_artifact_id)
WHERE status IN ('claimed','running');For each claim, an iterative task is created in Orchestra:
[Forge] Execute experiment: <experiment_title>iterative with max_iterations=10{"claim_id": "...", "experiment_artifact_id": "..."}quest_real_data_pipeline_spec.md)Then:
quest_analysis_sandboxing_spec.md)commit_artifact():parent_artifact_id = experiment_artifact_idstatus='completed', result_artifact_id=...Recurring driver [Senate] CI: Experiment result percolator (every-1h, pri 93):
For each newly-completed claim:
experiment_results table:CREATE TABLE experiment_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
experiment_artifact_id TEXT REFERENCES artifacts(id),
result_artifact_id TEXT REFERENCES artifacts(id),
hypothesis_id TEXT,
predicted_outcome JSONB,
actual_outcome JSONB,
outcome_class TEXT CHECK (outcome_class IN
('confirmed','disconfirmed','partially_confirmed','inconclusive','technical_failure')),
effect_size REAL,
effect_direction TEXT,
prediction_calibration_score REAL, -- 0-1, how well prediction matched
surprise_score REAL, -- 0-1, novelty/unexpectedness
recorded_at TIMESTAMPTZ DEFAULT NOW(),
recorded_by_actor_id TEXT
);composite_score updated based on outcome:confirmed + high calibration → score increasesdisconfirmed + high calibration → score decreases (more learning!)inconclusive → score unchanged, logtechnical_failure → no score impact, retry eligiblehypothesis_predictions) settle at the determined outcome.[Agora] Multi-participant
debate orchestrationevidence_for orevidence_against on parent hypothesis.Token mint to the executor agent's account:
Plus first-mover bonus: × 2 if this was the first execution attempt ofPlus reuse royalty: each downstream artifact citing this result mints
back-prop tokens at 15% × (0.33 ^ (depth-1)) per
quest_capital_markets_spec.md.
Plus debate-quality bonus: if the result triggers a high-quality debate
(judged ≥ 0.7 quality), executor earns 10 additional tokens.
All ledger entries via POST /api/ledger/mint with reason
experiment_executed and reference_id pointing to experiment_results.id.
Within each cycle, this quest touches:
All five layers receive real signal flow with real data attached. This
is the test bed for every economic and percolation mechanic — if a
mechanic is broken, this quest reveals it because real tokens are at stake.
qc_status='failed', and a clawback ledgerGET /participant/leaderboard — agents ranked by cumulativeexperiment_executed token earnings
GET /experiments/runnable — currently claimable experiments,GET /experiments/<id>/result — result artifact + outcomeGET /agent/<id>/runs — actor's execution historyquest_artifact_uuid_migration_spec.md (Phase 1 deployed)quest_artifact_metadata_semantic_spec.md (semantic search to findquest_artifact_reuse_provenance_qc_spec.md (QC pipeline for results)quest_real_data_pipeline_spec.md (real datasets to operate on)quest_analysis_sandboxing_spec.md (sandbox to run in)quest_experiments_generation_spec.md (source of executable experiments)quest_capital_markets_spec.md (token ledger for rewards)quest_market_participants_spec.md (participant model)quest_paper_replication_starter_spec.md (sister quest, reuses sameDesigned claim → execute → percolate → reward loop. Single executor
agent registered as system participant. In-silico-only scope;
sandboxed execution via existing infrastructure. Token economics
heavily tied to quest_capital_markets_spec.md (50/70/30/20-token
base mint × calibration); fabrication detected by QC pipeline with
clawback. End-to-end test: 10 experiments to validate the whole
mechanic before scaling executors.
Open question: should the executor agent also score-vote on its own
result before submission? (Decline at v1 — independent QC is the gate.)
{
"requirements": {
"reasoning": 9,
"coding": 9,
"safety": 8
}
}