[Demo] End-to-end provenance demo: trace reasoning chain from paper to model
Goal
The capstone demonstration of the artifact versioning system: show a complete, auditable reasoning chain from a published paper through every stage of the SciDEX pipeline to a trained model and its visualizations. Every link in the chain has pinned artifact versions, and the entire chain can be verified for reproducibility.
The Provenance Chain
PubMed Paper (PMID:XXXXX)
↓ [extracts]
KG Edges (gene-disease associations, rate constants)
↓ [generates]
Hypothesis (microglial activation in AD)
↓ [references]
External Dataset (SEA-AD Allen Brain Cell Atlas)
↓ [derives]
Tabular Dataset (differential expression results)
↓ [inputs to]
Analysis (gap analysis + model building)
↓ [produces]
Biophysical Model (microglial ODE, v1)
↓ [generates]
Figures (time-course, sensitivity, fit quality)
Artifact IDs in the Chain
Each node is a registered, versioned artifact:
Paper: paper-{pmid} (v1) — existing PubMed paper about TREM2/microglia
KG Edges: kg_edge-{batch_id} (v1) — edges extracted from that paper
Hypothesis: hypothesis-h-seaad-v4-26ba859b (v1) — existing hypothesis
Dataset: dataset-allen_brain-SEA-AD-MTG-10x (v1) — from d16-21
Tabular Data: tabular_dataset-{id} (v1) — from d16-22
Analysis: analysis-SDA-{id} (v1) — the model-building analysis
Model: model-biophys-microglia-v1 (v1) — from d16-23
Figures: figure-timecourse-{id} (v1), figure-sensitivity-{id} (v1) — from d16-23Demo Walkthrough
Step 1: Assemble the Chain
Identify or create artifact_links connecting all 8 artifacts:
links = [
("kg_edge-batch-001", "paper-12345", "derives_from"),
("hypothesis-h-seaad-...", "kg_edge-batch-001", "derives_from"),
("tabular_dataset-xxx", "dataset-allen_brain-SEA-AD-MTG-10x", "derives_from"),
("analysis-SDA-xxx", "hypothesis-h-seaad-...", "cites"),
("analysis-SDA-xxx", "tabular_dataset-xxx", "cites"),
("model-biophys-xxx", "analysis-SDA-xxx", "derives_from"),
("figure-tc-xxx", "model-biophys-xxx", "derives_from"),
("figure-sens-xxx", "model-biophys-xxx", "derives_from"),
]
Step 2: Pin All Versions
Ensure the analysis has
pinned_artifacts recording the exact version and content_hash of each input.
Step 3: Verify Reproducibility
result = verify_reproducibility("analysis-SDA-xxx")
assert result["reproducible"] == True
# All pinned inputs exist and content_hash matches
Step 4: Render Provenance DAG
Call
GET /api/analysis/{id}/provenance and render the full DAG:
- 8 nodes (one per artifact type, colored distinctly)
- 8+ edges showing derivation flow
- Version numbers and tags on each node
- Green checkmarks for verified artifacts
- Interactive: click any node to view artifact detail
Step 5: Showcase Page
Create or extend the demo page to include:
- "Provenance Chain" section with the rendered DAG
- Narrative walkthrough explaining each link
- "Verify Reproducibility" button that runs the check live
- Summary: "This model's predictions can be traced back to [paper title] and verified against [dataset name] — every intermediate step is versioned and auditable."
Acceptance Criteria
☑ All 8 artifacts in the chain are registered and linked
☑ artifact_links form a connected DAG from paper to figures
☑ All input artifacts have pinned versions in the analysis
☑ verify_reproducibility() returns True for the chain
☑ Provenance DAG renders with all nodes and edges
☑ Nodes colored by artifact type with version labels
☑ DAG is interactive (clickable nodes)
☑ Demo page includes narrative walkthrough
☑ "Verify Reproducibility" demonstrates live verification
☑ Work log updated with timestamped entry
Dependencies
- a17-24-REPR0001 (reproducible analysis chains / verify_reproducibility)
- d16-23-BMOD0001 (biophysical model — the chain must exist first)
Work Log
2026-04-14 02:52 PT — Slot minimax:55
- Task claimed and evaluated: dependency
a17-24-REPR0001 (verify_reproducibility) not on main, only on task branch 99e571466
- Decision: implement verify_reproducibility() and provenance API as part of this task to make demo work
- Investigated existing data: analysis-SDA-PROV-DEMO-001 already exists with all 8 chain artifacts registered
- Implemented verify_reproducibility() in scidex/atlas/artifact_registry.py: checks pinned artifacts exist and content hashes match
- Implemented get_analysis_provenance(): combines provenance graph + pinned artifacts + workflow steps + reproducibility
- Added GET /api/analyses/{id}/provenance endpoint (returns full DAG + verification)
- Added POST /api/analyses/{id}/verify-reproducibility endpoint (runs verification)
- Added /demo/provenance-chain page: Cytoscape DAG viz, narrative chain, live Verify button
- Tested: verify_reproducibility('analysis-SDA-PROV-DEMO-001') → reproducible=True, 4/4 artifacts verified
- Committed and pushed: c8c5b62ca