[Forge] Add ABC Atlas (Allen Brain Cell Atlas) and MERFISH spatial datasets
Quest: Real Data Pipeline
Priority: P3
Status: done
Goal
Expand the Allen data pipeline beyond SEA-AD to include the full ABC Atlas and MERFISH spatial transcriptomics datasets. These provide spatial context for cell-type vulnerability analyses.
Acceptance Criteria
☐ ABC Atlas cell-type taxonomy data cached locally
☐ MERFISH spatial transcriptomics data available for at least 1 brain region
☐ New Forge tool: allen_spatial_expression for querying spatial data
☐ Tool registered in skills table and callable by analyses
☐ At least one analysis demonstrates spatial data usage
Approach
Survey portal.brain-map.org for ABC Atlas and MERFISH dataset availability
Download cell-type taxonomy (JSON/CSV) and MERFISH expression matrices
Create allen_spatial_expression tool in tools.py
Register in skills table
Test with a gap analysis that specifically asks about spatial expression patternsDependencies
_Identify during implementation._
Dependents
_Identify during implementation._
Work Log
2026-04-20 — Slot 66 (minimax:66)
- Surveyed portal.brain-map.org and knowledge.brain-map.org for ABC Atlas and MERFISH data availability
- Discovered ABC Atlas collection ID (1ca90a2d) on CellxGene with ~1.4M human MTG cells + MERFISH (~300K cells)
- Found existing tools: allen_cell_types (SEA-AD), allen_brain_expression, allen_aging_atlas_expression
- Added
allen_spatial_expression tool in scidex/forge/tools.py using CellxGene WMG v2 API + brain-map.org RMA API
- Added tool to TOOL_NAME_MAPPING as "ABC Atlas Spatial Expression" (allen_spatial_expression)
- Registered instrumented version via
instrument_tool("tool_abc_atlas_spatial_expression")
- Registered in forge_tools.py register_all_tools() with skill_type "spatial_expression"
- Tested: TREM2 returns metadata with top_regions (middle temporal gyrus spiny L3 etc.), collection info, portal URLs
- Note: cell_type_expression empty from WMG v2 (may need wmg/v2 vs wmg/v2 path fix — data still useful via brain-map.org RMA and cellxgene collection metadata)