[Demo] Register Allen Brain Cell Atlas as external dataset artifact
Goal
Demonstrate external dataset referencing by registering the SEA-AD Allen Brain Cell Atlas — a dataset already central to existing SciDEX hypotheses and analyses — as a tracked dataset artifact. Show that browsing the dataset surfaces related KG entities and hypotheses.
Dataset Details
- Source: Allen Institute for Brain Science
- Name: Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD)
- URL: https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-10x_sea-ad
- Description: Single-nucleus RNA-seq from middle temporal gyrus (MTG) of AD and control donors. ~1.2M nuclei across 84 cell types from 84 donors.
- Format: H5AD (AnnData), available via Allen Brain Map API
- License: Allen Institute Terms of Use (open for research)
Registration
dataset_id = register_dataset(
source="allen_brain",
external_id="SEA-AD-MTG-10x",
url="https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-10x_sea-ad",
title="SEA-AD: Seattle Alzheimer's Disease Brain Cell Atlas",
description="Single-nucleus RNA-seq from middle temporal gyrus of AD and control donors. ~1.2M nuclei, 84 cell types, 84 donors. Includes gene expression, cell type annotations, donor metadata (age, sex, CERAD score, Braak stage).",
license="Allen Institute Terms of Use",
format="h5ad",
row_count=1200000,
schema_summary="Columns: gene (20,000+ genes), cell_type (84 types), donor_id, age, sex, CERAD, Braak, PMI, expression_values"
)
Linking to Existing Content
After registration, create artifact_links to:
Existing SEA-AD hypotheses (search for hypotheses mentioning SEA-AD, Allen Brain, or specific genes)
Existing SEA-AD analyses (SDA-2026-04-02-gap-seaad-v2-* and similar)
KG entities: TREM2, APOE, microglia, astrocytes, amyloid-beta (genes and cell types from the dataset)Link types:
- dataset → hypothesis: "supports" (data supports hypothesis investigation)
- dataset → analysis: "cites" (analysis uses this data)
- dataset → KG entity: "mentions" (dataset contains measurements for this entity)
Demo Walkthrough
Call register_dataset() with SEA-AD metadata
Show the dataset appears in GET /api/datasets
Create artifact_links to 3+ existing hypotheses
Create artifact_links to 2+ existing analyses
Create artifact_links to 5+ KG entities
Show dataset detail page with all linked content
Show that browsing a linked hypothesis now shows "Related Datasets: SEA-AD"Acceptance Criteria
☑ SEA-AD registered as dataset artifact with full metadata
☑ Artifact ID: dataset-allen_brain-SEA-AD-MTG-10x
☑ Linked to at least 3 existing hypotheses
☑ Linked to at least 2 existing analyses
☑ Linked to at least 5 KG entities
☑ Dataset detail page shows all links
☑ Hypothesis pages show "Related Datasets" section
☑ No actual data downloaded — reference only
☑ Work log updated with timestamped entry
Dependencies
- a17-21-EXTD0001 (external dataset registration)
Work Log
2026-04-16 11:00 PT — Slot 0
- Task reopened by audit (NO_COMMITS on prior branch)
- Investigated existing state:
dataset-allen_brain-SEA-AD existed but with wrong ID and only "related" links
- Added
GET /api/datasets endpoint to api.py (line ~19628) — delegates to GET /api/artifacts?artifact_type=dataset
- Created
scripts/register_seaad_external_dataset.py (not archived) to register dataset with correct artifact ID and create proper typed links
- Ran registration script:
- Registered
dataset-allen_brain-SEA-AD-MTG-10x with full metadata (source=allen_brain, format=h5ad, row_count=1.2M, license=Allen Institute Terms of Use)
- Created 5 "supports" links to hypotheses (TREM2, APOE, microglia-related)
- Created 4 "cites" links to SEA-AD analyses
- Created 5 "mentions" links to KG entity wiki pages (TREM2, APOE, microglia, astrocytes, amyloid-beta)
- Verified all 14 artifact_links present in DB
- Updated spec acceptance criteria and work log
- Result: Done — SEA-AD dataset registered with correct ID and all required links
Already Resolved — 2026-04-26T00:30:00Z
Verified on main at commit 1955f024a that the SEA-AD dataset registration is complete and intact:
- Artifact ID
dataset-allen_brain-SEA-AD-MTG-10x present in DB with full metadata
GET /api/artifacts/dataset-allen_brain-SEA-AD-MTG-10x returns 14 outgoing links:
- 5 hypotheses linked via "supports" (h-61196ade, h-seaad-v4-26ba859b, h-51e7234f, h-11795af0, h-43f72e21)
- 4 analyses linked via "cites" (analysis_sea_ad_001, SDA-2026-04-04-..., SDA-2026-04-03-gap-debate..., SDA-2026-04-03-gap-seaad...)
- 5 KG entities linked via "mentions" (wiki-TREM2, wiki-APOE, wiki-microglia, wiki-cell-types-astrocytes, wiki-cell-types-amyloid-accumulating-neurons)
GET /api/datasets correctly surfaces the dataset
GET /api/artifacts?artifact_type=dataset shows 4 datasets total
- All acceptance criteria satisfied (see checkboxes above)
Closing as already resolved.