Quest: Experiment Extraction & Evidence Atoms
Layer: Atlas
Priority: P93
Status: active
Vision
Papers are the bedrock of scientific evidence, but SciDEX currently treats them as opaque
blobs — a title, abstract, PMID, and maybe some JSON evidence claims. We have 520 papers
and 188 experiment artifacts, but the experiment artifacts have no structured metadata.
This quest transforms paper-derived knowledge from unstructured citations into **rich,
structured experiment records** — each one a first-class artifact with full lineage:
- What was done: experimental design, model system, methods, controls
- What was found: measurements, p-values, effect sizes, confidence intervals, sample sizes
- What it means: conclusions drawn, limitations acknowledged, context within the field
- Where it came from: paper source (PMID, section, figure/table references)
- How it was extracted: which agent, what methodology, extraction confidence
These structured experiments become the
ground truth anchors for SciDEX's entire
evidence system. When a hypothesis claims "TREM2 variants increase AD risk by 2-4x",
the evidence chain traces through an experiment artifact that contains the actual
odds ratio, confidence interval, sample size, and study design.
Why This Matters
Evidence grounding: Claims without structured experiment backing are unverifiable
Replication tracking: Multiple experiments testing the same hypothesis can be compared
Meta-analysis: Structured results enable systematic aggregation across studies
Debate quality: Skeptic agents can challenge specific methodological details
KG enrichment: Extracted entities, relations, and measurements grow the AtlasNeuro Focus (Preventing Sprawl)
Extraction schemas are scoped to neuroscience-relevant experiment types:
- Genetic association (GWAS, candidate gene studies, Mendelian genetics)
- Protein interaction (co-IP, mass spec, yeast two-hybrid, proximity labeling)
- Gene expression (RNA-seq, qPCR, microarray, single-cell)
- Animal model (transgenic mice, behavioral assays, histology)
- Cell biology (cell culture, organoids, iPSC-derived neurons)
- Clinical (biomarker studies, imaging, cognitive assessments, drug trials)
- Neuropathology (histology, immunostaining, electron microscopy)
Additional types can be added through Schema Governance (q-schema-governance).
Experiment Artifact Schema
# Type-specific metadata for artifact_type='experiment'
{
# Source provenance
"paper_pmid": "12345678",
"paper_section": "Results, Figure 3A",
"paper_doi": "10.1038/s41586-023-...",
# Experimental design
"experiment_type": "genetic_association", # From controlled vocabulary
"model_system": "human cohort", # mouse, rat, human, cell_line, organoid, etc.
"species": "Homo sapiens",
"tissue": "prefrontal cortex",
"sample_size": 1500,
"control_description": "Age-matched healthy controls (n=800)",
"methods_summary": "Genome-wide association study of AD risk variants...",
# Results (structured)
"results": {
"primary_finding": "TREM2 R47H variant associated with increased AD risk",
"measurements": [
{
"metric": "odds_ratio",
"value": 2.92,
"ci_lower": 2.09,
"ci_upper": 4.09,
"p_value": 3.4e-12,
"comparison": "R47H carriers vs non-carriers"
}
],
"effect_direction": "risk_increasing",
"replication_status": "replicated" # replicated, not_replicated, awaiting, conflicting
},
# Context
"disease_context": "Alzheimer's disease",
"entities_mentioned": ["TREM2", "R47H", "microglia", "neuroinflammation"],
"conclusions": "TREM2 R47H is a significant risk factor for late-onset AD...",
"limitations": "European ancestry cohort only; effect size may vary...",
# Extraction provenance
"extracted_by": "agent-atlas-extractor",
"extraction_method": "llm_structured_extraction",
"extraction_confidence": 0.85,
"extraction_timestamp": "2026-04-03T12:00:00Z",
"human_verified": false
}
Open Tasks
☐ atl-ex-01-SCHM: Define experiment extraction schemas per experiment type (P93)
☐ atl-ex-02-PIPE: Build LLM extraction pipeline from paper abstracts/full text (P92)
☐ atl-ex-03-LINK: Auto-link extracted experiments to KG entities (P90)
☐ atl-ex-04-QUAL: Extraction quality scoring and confidence calibration (P88)
☐ atl-ex-05-REPL: Replication tracking — match experiments testing same hypothesis (P86)
☐ atl-ex-06-META: Meta-analysis support — aggregate results across experiments (P84)
☐ atl-ex-07-BKFL: Backfill 188 existing experiment artifacts with structured metadata (P91)
☐ atl-ex-08-API: API endpoints for experiment browsing, search, and filtering (P87)
Dependency Chain
atl-ex-01-SCHM (Schema definition)
↓
atl-ex-02-PIPE (Extraction pipeline) ──→ atl-ex-07-BKFL (Backfill existing)
↓
atl-ex-03-LINK (KG entity linking)
↓
atl-ex-04-QUAL (Quality scoring) ──→ atl-ex-05-REPL (Replication tracking)
↓ ↓
atl-ex-08-API (API endpoints) atl-ex-06-META (Meta-analysis)
Integration Points
- Evidence Chains (b5298ea7): Extracted experiments become ground-truth evidence entries
- Knowledge Units (08c73de3): Experiment results become atomic, composable evidence blocks
- Artifact Debates (q-artifact-debates): Experiments are debatable — methodology can be challenged
- Schema Governance (q-schema-governance): Experiment schemas evolve through governance
- Epistemic Rigor (q-epistemic-rigor): Experiments anchor the falsifiability chain
Hypothesis Ranking Feedback
Extracted experiments create a bidirectional scoring relationship with hypotheses:
Hypothesis → Experiment: Hypotheses with explicit falsifiable predictions attract experiment design tasks. The system should proactively generate experiment proposals for high-scoring hypotheses that lack associated experiments.Experiment → Hypothesis: An experiment's quality scores (feasibility, impact, information gain) feed back into the linked hypothesis's composite score:
- Hypotheses with feasible, high-impact associated experiments rank higher
- Hypotheses with no testable experiments are penalized in relative ranking
- When experiment results confirm or falsify predictions, Bayesian updates adjust hypothesis confidence
Experiment Quality Dimensions (for ranking feedback):
-
Feasibility (0-1): Can this experiment actually be executed with available resources?
-
Impact (0-1): How much would the result change our world model?
-
Information gain (0-1): How much uncertainty does this experiment resolve?
-
Novelty (0-1): Does this test something not yet tested?
These dimensions should be computed during experiment extraction (atl-ex-04-QUAL) and stored in experiment metadata for consumption by the hypothesis scoring pipeline.
Success Criteria
☐ >500 papers have at least one structured experiment extracted
☐ Experiment artifacts have >80% field completeness (non-null structured metadata)
☐ Extracted experiments link to KG entities with >90% precision
☐ Extraction confidence correlates with human verification (calibration)
☐ Replication tracking identifies conflicting results for >10 hypothesis pairs
Work Log
_No entries yet._