[Atlas] Define experiment extraction schemas per experiment type done analysis:5

← Experiment Extraction
Define JSON schemas for 7 neuroscience experiment types with structured fields for methods, results, statistics, and provenance ## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (20)

Squash merge: orchestra/task/atl-ex-0-api-endpoints-for-experiment-browsing-se (7 commits)2026-04-26
Squash merge: orchestra/task/atl-ex-0-api-endpoints-for-experiment-browsing-se (7 commits)2026-04-26
[Atlas] Fix integration test assertions for experiment route smoke tests [task:atl-ex-08-API]2026-04-26
[Atlas] Fix replication route 500 + api.py route ordering + tests [task:atl-ex-08-API]2026-04-26
[Atlas] Fix experiment API route precedence [task:atl-ex-08-API]2026-04-25
[Atlas] Update spec work log for experiment API endpoints [task:atl-ex-08-API]2026-04-25
[Atlas] Add route-order regression test for experiment API endpoints [task:atl-ex-08-API]2026-04-25
[Atlas] Update spec work log for experiment API endpoints [task:atl-ex-08-API]2026-04-25
[Atlas] API endpoints for experiment browsing, search, and filtering [task:atl-ex-08-API]2026-04-25
Squash merge: atlas/atl-ex-04-QUAL-push (2 commits)2026-04-26
[Atlas] Update spec work log for extraction quality scoring [task:atl-ex-04-QUAL]2026-04-25
[Atlas] Extraction quality scoring and confidence calibration [task:atl-ex-04-QUAL]2026-04-25
Squash merge: orchestra/task/atl-ex-0-meta-analysis-support-aggregate-results (2 commits)2026-04-25
Squash merge: orchestra/task/atl-ex-0-meta-analysis-support-aggregate-results (2 commits)2026-04-25
[Atlas] Update spec work log for extraction quality scoring [task:atl-ex-04-QUAL]2026-04-25
[Verify] Meta-analysis spec verified and updated — all criteria implemented [task:atl-ex-06-META]2026-04-25
[Atlas] Extraction quality scoring and confidence calibration [task:atl-ex-04-QUAL]2026-04-25
[Atlas] Add meta-analysis module with pooled effect sizes and heterogeneity [task:atl-ex-06-META]2026-04-25
[Atlas] Replication tracking: clustering module + /api/experiments/replication/{entity} [task:atl-ex-05-REPL]2026-04-25
[Atlas] Replication tracking: clustering module + /api/experiments/replication/{entity} [task:atl-ex-05-REPL]2026-04-25
Spec File

Goal

Define structured JSON schemas for each neuroscience experiment type that SciDEX extracts
from papers. Each schema captures the minimum fields needed to represent the experiment's
design, results, and statistical evidence in a machine-readable, comparable format.

The schemas must be:

  • Specific enough to capture p-values, effect sizes, sample sizes, and methodology
  • General enough to accommodate variation within each experiment type
  • Composable so common fields (source provenance, extraction metadata) are shared
  • Validated so extraction agents produce consistent, queryable output

Acceptance Criteria

☐ Base experiment schema with shared fields (source, extraction metadata, entities)
☐ Type-specific schemas for 7 core experiment types:
- genetic_association (GWAS, candidate gene, Mendelian)
- protein_interaction (co-IP, mass spec, proximity labeling)
- gene_expression (RNA-seq, qPCR, microarray, scRNA-seq)
- animal_model (transgenic mice, behavioral, histology)
- cell_biology (cell culture, organoids, iPSC)
- clinical (biomarkers, imaging, cognitive, drug trials)
- neuropathology (histology, immunostaining, EM)
☐ JSON Schema files stored in schemas/experiments/ directory
☐ Validation function validate_experiment_metadata(metadata, experiment_type) in artifact_registry.py
☐ Controlled vocabularies for: model_system, species, tissue, effect_direction, replication_status
☐ Schema registered in schema_registry (if q-schema-governance has built it)
☐ Documentation with examples for each type

Approach

  • Survey 20-30 existing papers in SciDEX to identify common result structures
  • Define base schema (shared fields across all types)
  • Define type-specific extensions for each of the 7 types
  • Build JSON Schema validation files
  • Write validate_experiment_metadata() function
  • Create controlled vocabulary constants
  • Write documentation with 1 example per type
  • Dependencies

    • None (foundation task)

    Dependents

    • atl-ex-02-PIPE — Extraction pipeline uses these schemas as output targets
    • atl-ex-07-BKFL — Backfill uses schemas to structure existing experiment artifacts
    • q-schema-governance — Schemas registered in governance system once built

    Work Log

    • 2026-04-13: Verified task status — schemas/ directory with 7 JSON schemas and README already on main via prior work (bd3b63bd8, 90f1ffff7). Confirmed missing code was: EXPERIMENT_TYPES, CONTROLLED_VOCABULARIES constants and validate_experiment_metadata() function in artifact_registry.py. Added EXPERIMENT_TYPES (7 experiment types), CONTROLLED_VOCABULARIES (model_system, species, effect_direction, replication_status, tissue, brain_region), and validate_experiment_metadata(metadata, experiment_type) validation function with type-specific checks. Validation tested and working. Rebased onto latest origin/main. [task:atl-ex-01-SCHM]

    Payload JSON
    {
      "requirements": {
        "analysis": 5
      },
      "completion_shas": [
        "f949cb3865b3261f5644d891652a3307ec44dd49",
        "e8278bd573fffd4286a5d02e6a36a6d467d128c0"
      ],
      "completion_shas_checked_at": "2026-04-14T03:52:05.754968+00:00"
    }

    Sibling Tasks in Quest (Experiment Extraction) ↗