[Forge] Single-cell trajectory pipeline - counts to scVI to RNA velocity to cell-fate map artifact open

← Forge
Composes scanpy + scvi-tools + scvelo into trajectory inference with interactive cell-fate maps.

Git Commits (1)

[Forge] Single-cell trajectory pipeline — counts → scVI → RNA velocity → cell-fate map artifact [task:9b0ed33d-6b9d-42f4-a6d9-0fbadc49dcb8]2026-04-27
Spec File

Effort: extensive

Goal

Compose scanpy, scvi-tools, scvelo, and cellxgene-census into
an end-to-end single-cell trajectory inference workflow: ingest a
raw counts matrix (h5ad/h5/mtx) or pull a Census query, run QC and
batch-corrected embedding with scVI, infer RNA velocity with scVelo,
compute a cell-fate transition matrix and latent-time ordering, and
persist the resulting cell-fate map as an interactive artifact a
debate can cite when arguing trajectory claims.

Why this matters

q-rdp-cellxgene-target-expression exposes per-gene Census expression,
but trajectory inference is qualitatively different — it infers causal cell-state transitions from spliced/unspliced kinetics, not
just snapshot abundance. A neurodegen hypothesis "PV+ interneurons
trans-differentiate to a hyper-excitable state during AD progression"
is debatable today only via literature. With a trajectory pipeline,
the Skeptic can demand actual scVelo evidence and the Theorist can
provide it. This is the highest-value missing capability for cell-state
hypotheses across every vertical.

Acceptance Criteria

☐ New module scidex/forge/singlecell_trajectory.py (≤1000 LoC):
- ingest(source) — accepts h5ad/h5/mtx local file OR a Census
query dict ({organism, cell_type, tissue, disease}); returns
AnnData.
- qc_and_filter(adata)scanpy standard QC: mt%, n_counts,
n_genes, doublet detection via scrublet; emits filter
report as JSON.
- embed_with_scvi(adata, batch_key) — fits scVI on raw counts
(CPU or GPU, auto-detected); writes batch-corrected latent.
- compute_velocity(adata) — calls scvelo.tl.velocity and
scvelo.tl.velocity_graph; computes latent time and dynamic-
gene scores.
- cell_fate_map(adata) — derives transition probabilities,
identifies terminal-state and root-state cells via CellRank
(if installed; else scVelo-only), outputs the fate map as
JSON + an interactive plot.
- pipeline(source) — composes; outputs go to
data/scidex-artifacts/sc_trajectory/<run_id>/ with the
AnnData, embedding, velocity field, and HTML report.
☐ Migration sc_trajectory_run(run_id PRIMARY KEY,
source_kind TEXT CHECK IN ('census','file'), source_spec_json,
n_cells, n_genes, n_clusters, n_terminal_states, mean_velocity_confidence,
pipeline_version, hardware_profile, started_at, finished_at,
artifact_id)
.
tools.py registers singlecell_trajectory_pipeline with
@log_tool_call; CPU job hard-cap at 50 K cells; GPU at 500 K.
/artifacts/<id> renders the UMAP with velocity arrows + a
latent-time colormap; sortable terminal-state table.
☐ Theorist + Skeptic prompts gain a velocity_evidence block when
the hypothesis claims a cell-state transition and a recent
pipeline run covers the relevant cell type; injection mirrors
q-rdp-cellxgene-target-expression.
☐ Acceptance: python -m scidex.forge.singlecell_trajectory
--census '{"tissue":"brain","disease":"Alzheimer disease"}'
--max-cells 5000
completes <60 min on CPU; produces an artifact
whose n_terminal_states is plausible (≥1, ≤20) and the report
HTML opens cleanly.
☐ Tests: tests/test_sc_trajectory.py — synthetic 200-cell AnnData
with simulated trajectory; pipeline recovers ≥1 terminal state
matching the simulated fate.

Approach

  • Census ingestion uses the existing cellxgene-census skill;
  • passes obs_query to limit cells.
  • scVI training: 50 epochs default, early-stopping on validation
  • reconstruction loss; latent dim = 30.
  • scVelo dynamical mode preferred when memory allows; stochastic
  • fallback with explicit log entry.
  • Hardware-profile capture mirrors MD pipeline pattern.
  • Output HTML is a self-contained Plotly + AnnData JSON dump so the
  • artifact viewer can render it without the model present.

    Dependencies

    • scanpy, scvi-tools, scvelo, anndata, cellxgene-census
    skills.
    • q-rdp-cellxgene-target-expression — reuses the Census handle.
    • data/scidex-artifacts/ submodule.

    Work Log

    Sibling Tasks in Quest (Forge) ↗