[Demo] Register Allen Brain Cell Atlas as external dataset artifact done analysis:8 coding:7 reasoning:7

← Demo
Register the SEA-AD Allen Brain Cell Atlas as an external dataset artifact with full metadata: source=allen_brain, URL, description, schema summary (cell types, gene expression columns, donor metadata). Link it to existing SEA-AD hypotheses and analyses via artifact_links. Show that browsing the dataset surfaces related KG entities and hypotheses. Exercises: external dataset references, artifact linking. Depends on: a17-21-EXTD0001.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/d16-21-D-register-allen-brain-cell-atlas-as-exter (1 commits)2026-04-25
[Verify] SEA-AD dataset already registered, verified on main [task:d16-21-DSET0001]2026-04-25
Spec File

[Demo] Register Allen Brain Cell Atlas as external dataset artifact

Goal

Demonstrate external dataset referencing by registering the SEA-AD Allen Brain Cell Atlas — a dataset already central to existing SciDEX hypotheses and analyses — as a tracked dataset artifact. Show that browsing the dataset surfaces related KG entities and hypotheses.

Dataset Details

  • Source: Allen Institute for Brain Science
  • Name: Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD)
  • URL: https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-10x_sea-ad
  • Description: Single-nucleus RNA-seq from middle temporal gyrus (MTG) of AD and control donors. ~1.2M nuclei across 84 cell types from 84 donors.
  • Format: H5AD (AnnData), available via Allen Brain Map API
  • License: Allen Institute Terms of Use (open for research)

Registration

dataset_id = register_dataset(
    source="allen_brain",
    external_id="SEA-AD-MTG-10x",
    url="https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-10x_sea-ad",
    title="SEA-AD: Seattle Alzheimer's Disease Brain Cell Atlas",
    description="Single-nucleus RNA-seq from middle temporal gyrus of AD and control donors. ~1.2M nuclei, 84 cell types, 84 donors. Includes gene expression, cell type annotations, donor metadata (age, sex, CERAD score, Braak stage).",
    license="Allen Institute Terms of Use",
    format="h5ad",
    row_count=1200000,
    schema_summary="Columns: gene (20,000+ genes), cell_type (84 types), donor_id, age, sex, CERAD, Braak, PMI, expression_values"
)

Linking to Existing Content

After registration, create artifact_links to:
  • Existing SEA-AD hypotheses (search for hypotheses mentioning SEA-AD, Allen Brain, or specific genes)
  • Existing SEA-AD analyses (SDA-2026-04-02-gap-seaad-v2-* and similar)
  • KG entities: TREM2, APOE, microglia, astrocytes, amyloid-beta (genes and cell types from the dataset)
  • Link types:

    • dataset → hypothesis: "supports" (data supports hypothesis investigation)
    • dataset → analysis: "cites" (analysis uses this data)
    • dataset → KG entity: "mentions" (dataset contains measurements for this entity)

    Demo Walkthrough

  • Call register_dataset() with SEA-AD metadata
  • Show the dataset appears in GET /api/datasets
  • Create artifact_links to 3+ existing hypotheses
  • Create artifact_links to 2+ existing analyses
  • Create artifact_links to 5+ KG entities
  • Show dataset detail page with all linked content
  • Show that browsing a linked hypothesis now shows "Related Datasets: SEA-AD"
  • Acceptance Criteria

    ☑ SEA-AD registered as dataset artifact with full metadata
    ☑ Artifact ID: dataset-allen_brain-SEA-AD-MTG-10x
    ☑ Linked to at least 3 existing hypotheses
    ☑ Linked to at least 2 existing analyses
    ☑ Linked to at least 5 KG entities
    ☑ Dataset detail page shows all links
    ☑ Hypothesis pages show "Related Datasets" section
    ☑ No actual data downloaded — reference only
    ☑ Work log updated with timestamped entry

    Dependencies

    • a17-21-EXTD0001 (external dataset registration)

    Work Log

    2026-04-16 11:00 PT — Slot 0

    • Task reopened by audit (NO_COMMITS on prior branch)
    • Investigated existing state: dataset-allen_brain-SEA-AD existed but with wrong ID and only "related" links
    • Added GET /api/datasets endpoint to api.py (line ~19628) — delegates to GET /api/artifacts?artifact_type=dataset
    • Created scripts/register_seaad_external_dataset.py (not archived) to register dataset with correct artifact ID and create proper typed links
    • Ran registration script:
    - Registered dataset-allen_brain-SEA-AD-MTG-10x with full metadata (source=allen_brain, format=h5ad, row_count=1.2M, license=Allen Institute Terms of Use)
    - Created 5 "supports" links to hypotheses (TREM2, APOE, microglia-related)
    - Created 4 "cites" links to SEA-AD analyses
    - Created 5 "mentions" links to KG entity wiki pages (TREM2, APOE, microglia, astrocytes, amyloid-beta)
    • Verified all 14 artifact_links present in DB
    • Updated spec acceptance criteria and work log
    • Result: Done — SEA-AD dataset registered with correct ID and all required links

    Already Resolved — 2026-04-26T00:30:00Z

    Verified on main at commit 1955f024a that the SEA-AD dataset registration is complete and intact:

    • Artifact ID dataset-allen_brain-SEA-AD-MTG-10x present in DB with full metadata
    • GET /api/artifacts/dataset-allen_brain-SEA-AD-MTG-10x returns 14 outgoing links:
    - 5 hypotheses linked via "supports" (h-61196ade, h-seaad-v4-26ba859b, h-51e7234f, h-11795af0, h-43f72e21)
    - 4 analyses linked via "cites" (analysis_sea_ad_001, SDA-2026-04-04-..., SDA-2026-04-03-gap-debate..., SDA-2026-04-03-gap-seaad...)
    - 5 KG entities linked via "mentions" (wiki-TREM2, wiki-APOE, wiki-microglia, wiki-cell-types-astrocytes, wiki-cell-types-amyloid-accumulating-neurons)
    • GET /api/datasets correctly surfaces the dataset
    • GET /api/artifacts?artifact_type=dataset shows 4 datasets total
    • All acceptance criteria satisfied (see checkboxes above)

    Closing as already resolved.

    Payload JSON
    {
      "requirements": {
        "coding": 7,
        "reasoning": 7,
        "analysis": 8
      },
      "completion_shas": [
        "ef4c0312d"
      ],
      "completion_shas_checked_at": ""
    }

    Sibling Tasks in Quest (Demo) ↗

    Task Dependencies

    ↓ Referenced by (downstream)