Quest: Experiment Extraction & Evidence Atoms

This is the spec for the Experiment Extraction quest View Quest page →

Quest: Experiment Extraction & Evidence Atoms

Layer: Atlas Priority: P93 Status: active

Vision

Papers are the bedrock of scientific evidence, but SciDEX currently treats them as opaque
blobs — a title, abstract, PMID, and maybe some JSON evidence claims. We have 520 papers
and 188 experiment artifacts, but the experiment artifacts have no structured metadata.

This quest transforms paper-derived knowledge from unstructured citations into **rich,
structured experiment records** — each one a first-class artifact with full lineage:

What was done: experimental design, model system, methods, controls
What was found: measurements, p-values, effect sizes, confidence intervals, sample sizes
What it means: conclusions drawn, limitations acknowledged, context within the field
Where it came from: paper source (PMID, section, figure/table references)
How it was extracted: which agent, what methodology, extraction confidence

These structured experiments become the ground truth anchors for SciDEX's entire
evidence system. When a hypothesis claims "TREM2 variants increase AD risk by 2-4x",
the evidence chain traces through an experiment artifact that contains the actual
odds ratio, confidence interval, sample size, and study design.

Why This Matters

Evidence grounding: Claims without structured experiment backing are unverifiable

Replication tracking: Multiple experiments testing the same hypothesis can be compared

Meta-analysis: Structured results enable systematic aggregation across studies

Debate quality: Skeptic agents can challenge specific methodological details

KG enrichment: Extracted entities, relations, and measurements grow the Atlas

Neuro Focus (Preventing Sprawl)

Extraction schemas are scoped to neuroscience-relevant experiment types:

Genetic association (GWAS, candidate gene studies, Mendelian genetics)
Protein interaction (co-IP, mass spec, yeast two-hybrid, proximity labeling)
Gene expression (RNA-seq, qPCR, microarray, single-cell)
Animal model (transgenic mice, behavioral assays, histology)
Cell biology (cell culture, organoids, iPSC-derived neurons)
Clinical (biomarker studies, imaging, cognitive assessments, drug trials)
Neuropathology (histology, immunostaining, electron microscopy)

Additional types can be added through Schema Governance (q-schema-governance).

Experiment Artifact Schema

# Type-specific metadata for artifact_type='experiment'
{
    # Source provenance
    "paper_pmid": "12345678",
    "paper_section": "Results, Figure 3A",
    "paper_doi": "10.1038/s41586-023-...",
    
    # Experimental design
    "experiment_type": "genetic_association",  # From controlled vocabulary
    "model_system": "human cohort",            # mouse, rat, human, cell_line, organoid, etc.
    "species": "Homo sapiens",
    "tissue": "prefrontal cortex",
    "sample_size": 1500,
    "control_description": "Age-matched healthy controls (n=800)",
    "methods_summary": "Genome-wide association study of AD risk variants...",
    
    # Results (structured)
    "results": {
        "primary_finding": "TREM2 R47H variant associated with increased AD risk",
        "measurements": [
            {
                "metric": "odds_ratio",
                "value": 2.92,
                "ci_lower": 2.09,
                "ci_upper": 4.09,
                "p_value": 3.4e-12,
                "comparison": "R47H carriers vs non-carriers"
            }
        ],
        "effect_direction": "risk_increasing",
        "replication_status": "replicated"  # replicated, not_replicated, awaiting, conflicting
    },
    
    # Context
    "disease_context": "Alzheimer's disease",
    "entities_mentioned": ["TREM2", "R47H", "microglia", "neuroinflammation"],
    "conclusions": "TREM2 R47H is a significant risk factor for late-onset AD...",
    "limitations": "European ancestry cohort only; effect size may vary...",
    
    # Extraction provenance
    "extracted_by": "agent-atlas-extractor",
    "extraction_method": "llm_structured_extraction",
    "extraction_confidence": 0.85,
    "extraction_timestamp": "2026-04-03T12:00:00Z",
    "human_verified": false
}

Open Tasks

☐ atl-ex-01-SCHM: Define experiment extraction schemas per experiment type (P93)

☐ atl-ex-02-PIPE: Build LLM extraction pipeline from paper abstracts/full text (P92)

☐ atl-ex-03-LINK: Auto-link extracted experiments to KG entities (P90)

☐ atl-ex-04-QUAL: Extraction quality scoring and confidence calibration (P88)

☐ atl-ex-05-REPL: Replication tracking — match experiments testing same hypothesis (P86)

☐ atl-ex-06-META: Meta-analysis support — aggregate results across experiments (P84)

☐ atl-ex-07-BKFL: Backfill 188 existing experiment artifacts with structured metadata (P91)

☐ atl-ex-08-API: API endpoints for experiment browsing, search, and filtering (P87)

Dependency Chain

atl-ex-01-SCHM (Schema definition)
    ↓
atl-ex-02-PIPE (Extraction pipeline) ──→ atl-ex-07-BKFL (Backfill existing)
    ↓
atl-ex-03-LINK (KG entity linking)
    ↓
atl-ex-04-QUAL (Quality scoring) ──→ atl-ex-05-REPL (Replication tracking)
    ↓                                        ↓
atl-ex-08-API (API endpoints)        atl-ex-06-META (Meta-analysis)

Integration Points

Evidence Chains (b5298ea7): Extracted experiments become ground-truth evidence entries
Knowledge Units (08c73de3): Experiment results become atomic, composable evidence blocks
Artifact Debates (q-artifact-debates): Experiments are debatable — methodology can be challenged
Schema Governance (q-schema-governance): Experiment schemas evolve through governance
Epistemic Rigor (q-epistemic-rigor): Experiments anchor the falsifiability chain

Hypothesis Ranking Feedback

Extracted experiments create a bidirectional scoring relationship with hypotheses:

Hypothesis → Experiment: Hypotheses with explicit falsifiable predictions attract experiment design tasks. The system should proactively generate experiment proposals for high-scoring hypotheses that lack associated experiments.

Experiment → Hypothesis: An experiment's quality scores (feasibility, impact, information gain) feed back into the linked hypothesis's composite score:

- Hypotheses with feasible, high-impact associated experiments rank higher
- Hypotheses with no testable experiments are penalized in relative ranking
- When experiment results confirm or falsify predictions, Bayesian updates adjust hypothesis confidence

Experiment Quality Dimensions (for ranking feedback):

- Feasibility (0-1): Can this experiment actually be executed with available resources?
- Impact (0-1): How much would the result change our world model?
- Information gain (0-1): How much uncertainty does this experiment resolve?
- Novelty (0-1): Does this test something not yet tested?

These dimensions should be computed during experiment extraction (atl-ex-04-QUAL) and stored in experiment metadata for consumption by the hypothesis scoring pipeline.

Success Criteria

☐ >500 papers have at least one structured experiment extracted

☐ Experiment artifacts have >80% field completeness (non-null structured metadata)

☐ Extracted experiments link to KG entities with >90% precision

☐ Extraction confidence correlates with human verification (calibration)

☐ Replication tracking identifies conflicting results for >10 hypothesis pairs

Work Log

_No entries yet._

File: quest_experiment_extraction_spec.md

Modified: 2026-04-24 07:15

Size: 7.9 KB