Quest: Experiment Extraction & Evidence Atoms

← All Specs
This is the spec for the Experiment Extraction quest View Quest page →

Quest: Experiment Extraction & Evidence Atoms

Layer: Atlas Priority: P93 Status: active

Vision

Papers are the bedrock of scientific evidence, but SciDEX currently treats them as opaque
blobs — a title, abstract, PMID, and maybe some JSON evidence claims. We have 520 papers
and 188 experiment artifacts, but the experiment artifacts have no structured metadata.

This quest transforms paper-derived knowledge from unstructured citations into **rich,
structured experiment records** — each one a first-class artifact with full lineage:

  • What was done: experimental design, model system, methods, controls
  • What was found: measurements, p-values, effect sizes, confidence intervals, sample sizes
  • What it means: conclusions drawn, limitations acknowledged, context within the field
  • Where it came from: paper source (PMID, section, figure/table references)
  • How it was extracted: which agent, what methodology, extraction confidence

These structured experiments become the ground truth anchors for SciDEX's entire
evidence system. When a hypothesis claims "TREM2 variants increase AD risk by 2-4x",
the evidence chain traces through an experiment artifact that contains the actual
odds ratio, confidence interval, sample size, and study design.

Why This Matters

  • Evidence grounding: Claims without structured experiment backing are unverifiable
  • Replication tracking: Multiple experiments testing the same hypothesis can be compared
  • Meta-analysis: Structured results enable systematic aggregation across studies
  • Debate quality: Skeptic agents can challenge specific methodological details
  • KG enrichment: Extracted entities, relations, and measurements grow the Atlas
  • Neuro Focus (Preventing Sprawl)

    Extraction schemas are scoped to neuroscience-relevant experiment types:

    • Genetic association (GWAS, candidate gene studies, Mendelian genetics)
    • Protein interaction (co-IP, mass spec, yeast two-hybrid, proximity labeling)
    • Gene expression (RNA-seq, qPCR, microarray, single-cell)
    • Animal model (transgenic mice, behavioral assays, histology)
    • Cell biology (cell culture, organoids, iPSC-derived neurons)
    • Clinical (biomarker studies, imaging, cognitive assessments, drug trials)
    • Neuropathology (histology, immunostaining, electron microscopy)

    Additional types can be added through Schema Governance (q-schema-governance).

    Experiment Artifact Schema

    # Type-specific metadata for artifact_type='experiment'
    {
        # Source provenance
        "paper_pmid": "12345678",
        "paper_section": "Results, Figure 3A",
        "paper_doi": "10.1038/s41586-023-...",
        
        # Experimental design
        "experiment_type": "genetic_association",  # From controlled vocabulary
        "model_system": "human cohort",            # mouse, rat, human, cell_line, organoid, etc.
        "species": "Homo sapiens",
        "tissue": "prefrontal cortex",
        "sample_size": 1500,
        "control_description": "Age-matched healthy controls (n=800)",
        "methods_summary": "Genome-wide association study of AD risk variants...",
        
        # Results (structured)
        "results": {
            "primary_finding": "TREM2 R47H variant associated with increased AD risk",
            "measurements": [
                {
                    "metric": "odds_ratio",
                    "value": 2.92,
                    "ci_lower": 2.09,
                    "ci_upper": 4.09,
                    "p_value": 3.4e-12,
                    "comparison": "R47H carriers vs non-carriers"
                }
            ],
            "effect_direction": "risk_increasing",
            "replication_status": "replicated"  # replicated, not_replicated, awaiting, conflicting
        },
        
        # Context
        "disease_context": "Alzheimer's disease",
        "entities_mentioned": ["TREM2", "R47H", "microglia", "neuroinflammation"],
        "conclusions": "TREM2 R47H is a significant risk factor for late-onset AD...",
        "limitations": "European ancestry cohort only; effect size may vary...",
        
        # Extraction provenance
        "extracted_by": "agent-atlas-extractor",
        "extraction_method": "llm_structured_extraction",
        "extraction_confidence": 0.85,
        "extraction_timestamp": "2026-04-03T12:00:00Z",
        "human_verified": false
    }

    Open Tasks

    ☐ atl-ex-01-SCHM: Define experiment extraction schemas per experiment type (P93)
    ☐ atl-ex-02-PIPE: Build LLM extraction pipeline from paper abstracts/full text (P92)
    ☐ atl-ex-03-LINK: Auto-link extracted experiments to KG entities (P90)
    ☐ atl-ex-04-QUAL: Extraction quality scoring and confidence calibration (P88)
    ☐ atl-ex-05-REPL: Replication tracking — match experiments testing same hypothesis (P86)
    ☐ atl-ex-06-META: Meta-analysis support — aggregate results across experiments (P84)
    ☐ atl-ex-07-BKFL: Backfill 188 existing experiment artifacts with structured metadata (P91)
    ☐ atl-ex-08-API: API endpoints for experiment browsing, search, and filtering (P87)

    Dependency Chain

    atl-ex-01-SCHM (Schema definition)
        ↓
    atl-ex-02-PIPE (Extraction pipeline) ──→ atl-ex-07-BKFL (Backfill existing)
        ↓
    atl-ex-03-LINK (KG entity linking)
        ↓
    atl-ex-04-QUAL (Quality scoring) ──→ atl-ex-05-REPL (Replication tracking)
        ↓                                        ↓
    atl-ex-08-API (API endpoints)        atl-ex-06-META (Meta-analysis)

    Integration Points

    • Evidence Chains (b5298ea7): Extracted experiments become ground-truth evidence entries
    • Knowledge Units (08c73de3): Experiment results become atomic, composable evidence blocks
    • Artifact Debates (q-artifact-debates): Experiments are debatable — methodology can be challenged
    • Schema Governance (q-schema-governance): Experiment schemas evolve through governance
    • Epistemic Rigor (q-epistemic-rigor): Experiments anchor the falsifiability chain

    Hypothesis Ranking Feedback

    Extracted experiments create a bidirectional scoring relationship with hypotheses:

  • Hypothesis → Experiment: Hypotheses with explicit falsifiable predictions attract experiment design tasks. The system should proactively generate experiment proposals for high-scoring hypotheses that lack associated experiments.
  • Experiment → Hypothesis: An experiment's quality scores (feasibility, impact, information gain) feed back into the linked hypothesis's composite score:
  • - Hypotheses with feasible, high-impact associated experiments rank higher
    - Hypotheses with no testable experiments are penalized in relative ranking
    - When experiment results confirm or falsify predictions, Bayesian updates adjust hypothesis confidence

  • Experiment Quality Dimensions (for ranking feedback):
  • - Feasibility (0-1): Can this experiment actually be executed with available resources?
    - Impact (0-1): How much would the result change our world model?
    - Information gain (0-1): How much uncertainty does this experiment resolve?
    - Novelty (0-1): Does this test something not yet tested?

    These dimensions should be computed during experiment extraction (atl-ex-04-QUAL) and stored in experiment metadata for consumption by the hypothesis scoring pipeline.

    Success Criteria

    ☐ >500 papers have at least one structured experiment extracted
    ☐ Experiment artifacts have >80% field completeness (non-null structured metadata)
    ☐ Extracted experiments link to KG entities with >90% precision
    ☐ Extraction confidence correlates with human verification (calibration)
    ☐ Replication tracking identifies conflicting results for >10 hypothesis pairs

    Work Log

    _No entries yet._

    File: quest_experiment_extraction_spec.md
    Modified: 2026-04-24 07:15
    Size: 7.9 KB