[Artifacts] Rebuild nb_sea_ad_001 spotlight notebook from real Forge tools

Background

The spotlight notebook nb_sea_ad_001 (SEA-AD Cell-Type Vulnerability Analysis, is_spotlight=1) was seeded into the notebooks table on 2026-04-04T09:02:45 with rendered_html_path=/notebooks/nb_sea_ad_001 — a placeholder path pointing at no file. Three related rows also referenced non-existent artifacts:

notebooks.nb-analysis_sea_ad_001.rendered_html_path=site/notebooks/analysis_sea_ad_001.html — no such file
analyses.analysis_sea_ad_001.artifact_path=site/notebooks/sea_ad_cell_vulnerability_analysis.ipynb — no such file

As a result, https://scidex.ai/notebook/nb_sea_ad_001 rendered the "Notebook Not Rendered Yet" warning card. An earlier automation (task a06eb224, 2026-04-04) incorrectly marked this as a "false positive" because the HTTP endpoint returned 200 — but 200 with a warning card is not the same as a rendered notebook.

The upstream task (70239694 D16.2: SEA-AD Single-Cell Analysis) was marked done without producing the notebook; the orchestrator max_tool_rounds fix (70239694_SOLUTION.md) was applied but the notebook generation step was never completed.

Goal

Deliver a credible, executable Jupyter notebook at nb_sea_ad_001 that:

Uses real Forge tool outputs (not simulated data), with each call instrumented via @log_tool_call for provenance.

Binds each of the 5 hypotheses on analysis_sea_ad_001 to external literature evidence.

Produces a self-contained HTML rendering that the existing /notebook/{id} route can serve.

Is regeneratable via a single script call, providing a template for other spotlight notebooks.

Acceptance Criteria

☑ site/notebooks/nb_sea_ad_001.ipynb exists, executes cleanly, and contains embedded output cells

☑ site/notebooks/nb_sea_ad_001.html exists and contains jp-Notebook markup (compatible with _darkify_notebook)

☑ Notebook has at least 10 code cells hitting live APIs (PubMed, Allen, STRING, Reactome, Enrichr, HPA, MyGene)

☑ Forge tool_calls table shows provenance rows for each tool invoked during generation

☑ DB rows fixed: notebooks.nb_sea_ad_001, notebooks.nb-analysis_sea_ad_001, analyses.analysis_sea_ad_001

☑ Each of the 5 hypotheses has a PubMed evidence table

☑ Notebook includes GO:BP enrichment bar chart + STRING PPI network figure

☑ Regeneration runnable via python3 scripts/generate_nb_sea_ad_001.py [--force]

☑ Cache written to data/forge_cache/seaad/*.json (idempotent re-runs)

What was actually delivered

Files added

forge/seaad_analysis.py — reusable data collector wrapping 7 tool families over 11 AD-vulnerability genes + per-hypothesis literature queries. All calls route through instrumented tools.py functions.
scripts/generate_nb_sea_ad_001.py — notebook generator: collects data, assembles cells, executes in-place via nbconvert.ExecutePreprocessor, renders HTML, updates DB rows.
site/notebooks/nb_sea_ad_001.ipynb — 31 cells (16 markdown, 15 code), all executed, 2 embedded matplotlib figures.
site/notebooks/nb_sea_ad_001.html — 552 KB nbconvert HTML export with jp-Notebook markup.
data/forge_cache/seaad/*.json — 78 cached JSON bundles (gene annotations, Allen, STRING, Reactome, Enrichr, PubMed per hypothesis).

Forge provenance (from `tool_calls` table, captured 2026-04-05)

Tool	Calls	Mean latency
pubmed_search	18	555 ms
allen_brain_expression	11	117 ms
allen_cell_types	11	1452 ms
gene_info	11	1221 ms
human_protein_atlas	11	1419 ms
reactome_pathways	11	668 ms
enrichr_pathway_analysis	3	1535 ms
string_protein_interactions	1	1281 ms

Scientific content highlights

Top GO:BP enrichment for the 11-gene set: Microglial Cell Activation (GO:0001774) with p = 3.2 × 10⁻¹⁴, odds ratio 2219.7. Astrocyte Activation (GO:0048143) at p = 5.2 × 10⁻¹² follows. These enrichments validate that the chosen gene set does capture the cell types named in the analysis hypotheses.
STRING physical PPI network: 11 edges including canonical APOE–MAPT (score 0.879) and TREM2–TYROBP connections.
Reactome pathway footprint: 59 pathway annotations across 11 genes; Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell (R-HSA-198933) is the top TREM2 pathway.
Literature evidence per hypothesis: 5/5 hypotheses bound to PubMed papers from the last 10 years.

Caveats (known gaps)

This notebook uses aggregated / curated data sources, not per-cell SEA-AD snRNA-seq matrices. The following remain open:

Gap	Tracked under
Bulk SEA-AD h5ad download + local cache	`19c06875` (currently marked done but data/allen/seaad/ is empty)
Per-cell DE fed into the debate loop	`70b96f50`
ABC Atlas + MERFISH spatial queries	`f9ba4c33`
Forge data-validation layer	`4bd2f9de`

The allen_cell_types tool call returns Allen Cell Types API specimen metadata (electrophysiology/morphology), not SEA-AD snRNA-seq aggregates. This is a known limitation of the existing tools.py:allen_cell_types implementation.

Reusability

scripts/generate_nb_sea_ad_001.py is a template. To produce another spotlight notebook with this pattern, copy the script, change NOTEBOOK_ID, ANALYSIS_ID, TARGET_GENES, and the topical keywords in forge/seaad_analysis.py:_TOPIC_KEYWORDS. The cache directory naming mirrors the notebook id.

Work Log

2026-04-27 05:30 UTC (iteration 5 retry)

Confirmed the previous retry artifacts (exchange_debate_seed_v2.json, falsification_power_plan_v1.json, gtex_brain_expression_v1.json) are already present on current origin/main; no duplicate restoration needed.
Added data/analysis_outputs/analysis-SEAAD-20260402/donor_model_preregistration_v1.json, a preregistration-ready donor-level modeling plan that converts the five cell-type driver priors into concrete formulas, harmonized covariates, replication dataset requirements, confidence interval rules, negative controls, and Exchange resolution gates.
Preserved the prior retry's valid benchmark_provenance_matrix_v1.json artifact while rebuilding the branch against current origin/main so unrelated stale-base drift is not carried forward.
Noted gate hygiene: the prior scripts/artifact_sweeper.sh rejection target no longer exists on current origin/main; data/scidex-papers has no unpushed local commits in this worktree, only a dirty checkout state outside this task's intended diff.

2026-04-27 05:10 UTC (iteration 5)

Verified current origin/main already contains the iteration 4 retry artifacts (exchange_debate_seed_v2.json, falsification_power_plan_v1.json, gtex_brain_expression_v1.json), so this slice adds a new benchmark-provenance layer instead of re-pushing duplicate work.
Added data/analysis_outputs/analysis-SEAAD-20260402/benchmark_provenance_matrix_v1.json, an auditable five-source x five-cell-type evidence matrix with directness weights, weighted vulnerability scores, leave-one-source-out sensitivity ranges, Exchange-use notes, and falsification hooks.
This advances acceptance criterion 2 by making the published-dataset benchmark traceable at the dataset/cell-type level, and strengthens criterion 4 by giving Theorist/Skeptic arguments source-specific evidence to cite.

2026-04-27 03:05 UTC (iteration 4 retry)

Rebuilt the retry branch from current origin/main to remove unrelated critical-file and persona/script/doc drift that caused review-gate failures.
Restored the three structured artifacts required by the current SEA-AD-gene-expression-analysis notebook sections: exchange_debate_seed_v2.json, falsification_power_plan_v1.json, and gtex_brain_expression_v1.json.
Fixed the GTEx notebook section to reuse the portable find_repo_root() helper instead of hard-coding the task worktree path.
Planned validation: JSON syntax checks for all restored artifacts plus full notebook execution and HTML export.

2026-04-26 23:30 UTC (iteration 3)

Expanded the primary analysis (cells 1–18) from 2 cell types (microglia/neurons) to all 5 CNS cell types (excitatory neurons, astrocytes, microglia, oligodendrocytes, OPCs).
Rewrote cell 6 (simulation): literature-grounded expression profiles for each of 20 AD genes × 5 cell types — 5000 data points total.
Rewrote cell 8 (DE analysis): per-cell-type log2FC vs all-others (Bonferroni), showing neuron enrichment of APP/MAPT and microglia enrichment of 7 DAM/immune genes.
Rewrote cell 10 (heatmap): gene × cell-type z-score heatmap for all 20 genes × 5 cell types; raw means annotated.
Rewrote cell 12 (volcano): 5-panel per-cell-type enrichment volcano replacing the binary Microglia/Neuron plot.
Updated cell 15 (summary): table showing per-cell-type enrichment, biological interpretation, and comparison with Section 10 benchmark scores.
Also committed seaad_marker_module_realdata_v1.json, seaad_marker_specificity_v1.csv, and scripts/build_seaad_marker_module_artifact.py from prior uncommitted slot work.
Re-executed and re-rendered notebook (503 kB .ipynb, 855 kB .html).

2026-04-26 23:05 UTC

Iteration task ab6d4823-7145-4970-a2ae-f4ac6d71df37: planned a real SEA-AD summary-matrix increment for the live SEA-AD-gene-expression-analysis notebook.
Scope: compute five cell-type marker-module scores directly from data/allen/seaad/medians.csv, trimmed_means.csv, and cell_metadata.csv; write a structured artifact; append an executed notebook section that contrasts real SEA-AD marker specificity with the existing benchmark/driver intervals.
Added scripts/build_seaad_marker_module_artifact.py plus seaad_marker_module_realdata_v1.json and seaad_marker_specificity_v1.csv; the artifact summarizes 91,450 QC nuclei across 52 SEA-AD cluster columns for excitatory neurons, astrocytes, microglia/PVM, oligodendrocytes, and OPCs.
Appended executed notebook section 14, comparing real SEA-AD marker specificity against the candidate-driver priors and calling out the weaker compact MTG microglia/PVM summary signal as a debate-resolution caveat.
Re-executed and re-rendered .claude/worktrees/slot-15-1775204013/site/notebooks/SEA-AD-gene-expression-analysis.ipynb / .html; the exported DE CSV now contains the full five-cell-type table instead of only the older microglia-vs-neuron slice.

2026-04-26 22:32 UTC

Iteration task ab6d4823-7145-4970-a2ae-f4ac6d71df37: deepened the live SEA-AD-gene-expression-analysis spotlight notebook, not just the earlier nb_sea_ad_001 artifact.
Added data/analysis_outputs/analysis-SEAAD-20260402/cell_type_driver_intervals_v2.json, a structured driver-prior artifact with interval estimates, testable predictions, and negative controls for excitatory-neuron, astrocyte, microglial, oligodendrocyte, and OPC hypotheses.
Updated and executed .claude/worktrees/slot-15-1775204013/site/notebooks/SEA-AD-gene-expression-analysis.ipynb: fixed stale main-checkout output paths, loaded the cross-dataset benchmark artifact, displayed five-cell-type vulnerability scores, plotted benchmark intervals, ranked candidate drivers, and embedded Exchange-ready debate/falsification sections.
Re-rendered .claude/worktrees/slot-15-1775204013/site/notebooks/SEA-AD-gene-expression-analysis.html via jupyter nbconvert --to html.

2026-04-26 22:20 UTC

Iteration task ab6d4823-7145-4970-a2ae-f4ac6d71df37: added a candidate-driver test matrix so the notebook moves from ranked cell-type vulnerability into preregistration-ready hypothesis tests.
Added data/analysis_outputs/analysis-SEAAD-20260402/driver_candidate_tests_v2.json, covering the five driver hypotheses with marker modules, shared covariates, primary endpoints, negative controls, assay-readiness priors, decisive replication thresholds, and falsifying patterns.
Appended executed notebook section 14 to site/notebooks/nb_sea_ad_001.ipynb, computing driver-test priority scores with leave-one-source-out sensitivity intervals and rendering a forest-style interval plot for Exchange triage.
Re-rendered site/notebooks/nb_sea_ad_001.html via jupyter nbconvert --execute --to html.

2026-04-26 14:45 UTC

Iteration task ab6d4823-7145-4970-a2ae-f4ac6d71df37: deepened the spotlight notebook beyond the prior microglia/neuron emphasis.
Added data/analysis_outputs/analysis-SEAAD-20260402/cell_type_benchmarks_v1.json, a structured cross-dataset benchmark covering excitatory neurons, astrocytes, microglia, oligodendrocytes, and OPCs across SEA-AD 2024, Mathys 2019, Grubman 2019, Leng 2021, and Mathys multiregion 2024.
Appended executed sections 12-13 to site/notebooks/nb_sea_ad_001.ipynb: per-cell-type vulnerability scores with leave-one-benchmark-out intervals, 5 candidate cell-type-specific drivers with interval estimates, an Exchange-ready market/debate seed, and concrete falsification experiments.
Re-rendered site/notebooks/nb_sea_ad_001.html via jupyter nbconvert --to html.

2026-04-05 08:12 UTC

Diagnosed: notebooks.nb_sea_ad_001.rendered_html_path was a placeholder string, not a valid path. No ipynb had ever been generated.
Confirmed Allen/PubMed/STRING/Reactome/Enrichr/HPA APIs are reachable and return real data from the sandbox.
Wrote forge/seaad_analysis.py collector; ran full_collection() in 62 s (first invocation, uncached).
Wrote scripts/generate_nb_sea_ad_001.py; produced ipynb (31 cells) and executed cleanly via ExecutePreprocessor.
Rendered HTML, fixed 3 DB rows.
Committed to fix/nb-sea-ad-001-real-data.

File: seaad_spotlight_notebook_rebuild_spec.md

Modified: 2026-04-28 02:29

Size: 13.8 KB

[Artifacts] Rebuild nb_sea_ad_001 spotlight notebook from real Forge tools