Astrocyte reactivity subtypes in neurodegeneration¶
Notebook ID: nb-sda-2026-04-01-gap-007 · Analysis: sda-2026-04-01-gap-007 · Generated: 2026-04-10
Research question¶
Astrocytes adopt A1 (neurotoxic) and A2 (neuroprotective) phenotypes, but recent single-cell data reveals far greater heterogeneity. Mapping reactive subtypes to disease stages and therapeutic targets is needed.
Approach¶
This notebook is generated programmatically from real Forge tool calls and SciDEX debate data. Code cells load cached evidence bundles from data/forge_cache/seaad/*.json and query live data from scidex.db. Re-run python3 scripts/regenerate_notebooks.py --analysis sda-2026-04-01-gap-007 --force to refresh.
7 hypotheses were generated and debated. The knowledge graph has 178 edges.
Debate Summary¶
Quality score: 0.71 · Rounds: 4 · Personas: Theorist, Skeptic, Domain_Expert, Synthesizer
1. Forge tool provenance¶
import json, sys, sqlite3
from pathlib import Path
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
matplotlib.rcParams['figure.dpi'] = 110
matplotlib.rcParams['figure.facecolor'] = 'white'
REPO = Path('.').resolve()
sys.path.insert(0, str(REPO))
CACHE_SUB = 'seaad'
CACHE = REPO / 'data' / 'forge_cache' / CACHE_SUB
def load(name):
p = CACHE / f'{name}.json'
if p.exists():
return json.loads(p.read_text())
return {}
db_path = Path('/home/ubuntu/scidex/scidex.db')
try:
db = sqlite3.connect(str(db_path))
prov = pd.read_sql_query('''
SELECT skill_id, status, COUNT(*) AS n_calls,
ROUND(AVG(duration_ms),0) AS mean_ms
FROM tool_calls
WHERE created_at >= date('now','-30 days')
GROUP BY skill_id, status
ORDER BY n_calls DESC
''', db)
db.close()
prov['tool'] = prov['skill_id'].str.replace('tool_', '', regex=False)
print(f'{len(prov)} tool-call aggregates (last 30 days):')
prov[['tool','status','n_calls','mean_ms']].head(20)
except Exception as e:
print(f'Provenance unavailable: {e}')
77 tool-call aggregates (last 30 days):
2. Target gene annotations¶
ann_rows = []
for g in ['AND']:
mg = load(f'mygene_{g}')
hpa = load(f'hpa_{g}')
if not mg and not hpa:
ann_rows.append({'gene': g, 'name': '—', 'protein_class': '—',
'disease_involvement': '—'})
continue
ann_rows.append({
'gene': g,
'name': (mg.get('name') or '')[:55],
'protein_class': ', '.join((hpa.get('protein_class') or [])[:2])[:55]
if isinstance(hpa.get('protein_class'), list)
else str(hpa.get('protein_class') or '—')[:55],
'disease_involvement': ', '.join((hpa.get('disease_involvement') or [])[:2])[:55]
if isinstance(hpa.get('disease_involvement'), list)
else str(hpa.get('disease_involvement') or '')[:55],
})
pd.DataFrame(ann_rows)
| gene | name | protein_class | disease_involvement | |
|---|---|---|---|---|
| 0 | AND | — | — | — |
3. GO Biological Process enrichment (Enrichr)¶
go_bp = load('enrichr_GO_Biological_Process')
if isinstance(go_bp, list) and go_bp:
go_df = pd.DataFrame(go_bp[:10])[['term','p_value','odds_ratio','genes']]
go_df['p_value'] = go_df['p_value'].apply(lambda p: f'{p:.2e}')
go_df['odds_ratio'] = go_df['odds_ratio'].round(1)
go_df['term'] = go_df['term'].str[:60]
go_df['n_hits'] = go_df['genes'].apply(len)
go_df['genes'] = go_df['genes'].apply(lambda g: ', '.join(g))
go_df[['term','n_hits','p_value','odds_ratio','genes']]
else:
print('No GO:BP enrichment data')
# Visualize top GO BP enrichment
go_bp = load('enrichr_GO_Biological_Process')
if isinstance(go_bp, list) and go_bp:
top = go_bp[:8]
terms = [t['term'][:45] for t in top][::-1]
neglogp = [-np.log10(max(t['p_value'], 1e-300)) for t in top][::-1]
fig, ax = plt.subplots(figsize=(9, 4.5))
ax.barh(terms, neglogp, color='#4fc3f7')
ax.set_xlabel('-log10(p-value)')
ax.set_title('Top GO:BP enrichment (Enrichr)')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout(); plt.show()
else:
print('No GO:BP data to plot')
4. KEGG pathway enrichment¶
kegg = load('enrichr_KEGG_Pathways')
if isinstance(kegg, list) and kegg:
kegg_df = pd.DataFrame(kegg[:10])[['term','p_value','odds_ratio','genes']]
kegg_df['genes'] = kegg_df['genes'].apply(lambda g: ', '.join(g))
kegg_df['p_value'] = kegg_df['p_value'].apply(lambda p: f'{p:.2e}')
kegg_df['odds_ratio'] = kegg_df['odds_ratio'].round(1)
kegg_df
else:
print('No KEGG enrichment data')
No KEGG enrichment data
5. STRING protein interaction network¶
ppi = load('string_network')
if isinstance(ppi, list) and ppi:
ppi_df = pd.DataFrame(ppi).sort_values('score', ascending=False)
display_cols = [c for c in ['protein1','protein2','score','escore','tscore'] if c in ppi_df.columns]
print(f'{len(ppi_df)} STRING edges')
ppi_df[display_cols].head(20)
else:
print('No STRING edges returned')
11 STRING edges
# Network figure
ppi = load('string_network')
if isinstance(ppi, list) and ppi:
import math
nodes = sorted({p for e in ppi for p in (e['protein1'], e['protein2'])})
n = len(nodes)
pos = {n_: (math.cos(2*math.pi*i/n), math.sin(2*math.pi*i/n)) for i, n_ in enumerate(nodes)}
fig, ax = plt.subplots(figsize=(7, 7))
for e in ppi:
x1,y1 = pos[e['protein1']]; x2,y2 = pos[e['protein2']]
ax.plot([x1,x2],[y1,y2], color='#888', alpha=0.3+0.5*e['score'],
linewidth=0.5+2*e['score'])
for name,(x,y) in pos.items():
ax.scatter([x],[y], s=450, color='#ffd54f', edgecolors='#333', zorder=3)
ax.annotate(name, (x,y), ha='center', va='center', fontsize=8, fontweight='bold', zorder=4)
ax.set_aspect('equal'); ax.axis('off')
ax.set_title(f'STRING PPI network ({len(ppi)} edges)')
plt.tight_layout(); plt.show()
else:
print('No STRING data to visualize')
6. Reactome pathway footprint¶
pw_rows = []
for g in ['AND']:
pws = load(f'reactome_{g}')
if isinstance(pws, list):
pw_rows.append({'gene': g, 'n_pathways': len(pws),
'top_pathway': (pws[0]['name'] if pws else '—')[:70]})
else:
pw_rows.append({'gene': g, 'n_pathways': 0, 'top_pathway': '—'})
pd.DataFrame(pw_rows).sort_values('n_pathways', ascending=False)
| gene | n_pathways | top_pathway | |
|---|---|---|---|
| 0 | AND | 0 | — |
7. Allen Brain Atlas ISH regional expression¶
ish_rows = []
for g in ['AND']:
ish = load(f'allen_ish_{g}')
regions = ish.get('regions') or [] if isinstance(ish, dict) else []
ish_rows.append({
'gene': g,
'n_ish_regions': len(regions),
'top_region': (regions[0].get('structure','') if regions else '—')[:45],
'top_energy': round(regions[0].get('expression_energy',0), 2) if regions else None,
})
pd.DataFrame(ish_rows)
| gene | n_ish_regions | top_region | top_energy | |
|---|---|---|---|---|
| 0 | AND | 0 | — | — |
8. Hypothesis ranking (7 hypotheses)¶
hyp_data = [('Metabolic Switch Targeting for A1→A2 Repolarization', 0.682), ('Lipid Droplet Dynamics as Phenotype Switches', 0.64), ('Mitochondrial Transfer Pathway Enhancement', 0.639), ('Circadian Rhythm Entrainment of Reactive Astrocytes', 0.635), ('Mechanosensitive Ion Channel Reprogramming', 0.622), ('Purinergic Signaling Polarization Control', 0.59), ('Epigenetic Memory Erasure via TET2 Activation', 0.585)]
titles = [h[0] for h in hyp_data][::-1]
scores = [h[1] for h in hyp_data][::-1]
fig, ax = plt.subplots(figsize=(10, max(8, len(titles)*0.4)))
colors = ['#ef5350' if s >= 0.6 else '#ffa726' if s >= 0.5 else '#66bb6a' for s in scores]
ax.barh(range(len(titles)), scores, color=colors)
ax.set_yticks(range(len(titles))); ax.set_yticklabels(titles, fontsize=7)
ax.set_xlabel('Composite Score'); ax.set_title('Astrocyte reactivity subtypes in neurodegeneration')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout(); plt.show()
9. Score dimension heatmap (top 10)¶
labels = ['Metabolic Switch Targeting for A1→A2 Rep', 'Lipid Droplet Dynamics as Phenotype Swit', 'Mitochondrial Transfer Pathway Enhanceme', 'Circadian Rhythm Entrainment of Reactive', 'Mechanosensitive Ion Channel Reprogrammi', 'Purinergic Signaling Polarization Contro', 'Epigenetic Memory Erasure via TET2 Activ']
matrix = np.array([[0.72, 0.48, 0.58, 0.65, 0.535, 0.45, 0.42, 0.52, 0.35], [0.8, 0.5, 0.55, 0.4, 0.436, 0.45, 0.4, 0.65, 0.6], [0.85, 0.25, 0.65, 0.35, 0.436, 0.4, 0.3, 0.4, 0.5], [0.75, 0.4, 0.6, 0.45, 0.535, 0.6, 0.45, 0.45, 0.55], [0.8, 0.6, 0.65, 0.7, 0.436, 0.5, 0.55, 0.6, 0.65], [0.65, 0.85, 0.8, 0.75, 0.436, 0.75, 0.7, 0.9, 0.7], [0.8, 0.45, 0.65, 0.75, 0.532, 0.75, 0.6, 0.5, 0.35]])
dims = ['novelty_score', 'feasibility_score', 'impact_score', 'mechanistic_plausibility_score', 'clinical_relevance_score', 'data_availability_score', 'reproducibility_score', 'druggability_score', 'safety_profile_score']
if matrix.size:
fig, ax = plt.subplots(figsize=(10, 5))
im = ax.imshow(matrix, cmap='RdYlGn', aspect='auto', vmin=0, vmax=1)
ax.set_xticks(range(len(dims)))
ax.set_xticklabels([d.replace('_score','').replace('_',' ').title() for d in dims],
rotation=45, ha='right', fontsize=8)
ax.set_yticks(range(len(labels))); ax.set_yticklabels(labels, fontsize=7)
ax.set_title('Score dimensions — top hypotheses')
plt.colorbar(im, ax=ax, shrink=0.8)
plt.tight_layout(); plt.show()
else:
print('No score data available')
10. PubMed evidence per hypothesis¶
Hypothesis 1: Metabolic Switch Targeting for A1→A2 Repolarization¶
Target genes: HK2 · Composite score: 0.682
Molecular Mechanism and Rationale
The hexokinase 2 (HK2)-mediated metabolic switch represents a fundamental regulatory mechanism governing astrocyte phenotypic polarization between the neurotoxic A1 and neuroprotective A2 states. HK2, the predominant hexokinase isoform in the brain, catalyzes the ATP-dependent phosphorylation of glucose to glucose-6-phosphate, the rate-limiting step of glycolysis. This enzyme's unique mitochondrial localization through its interaction with the voltage-depen
hid = 'h-a1b56d74'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
Hypothesis 2: Lipid Droplet Dynamics as Phenotype Switches¶
Target genes: DGAT1 and SOAT1 · Composite score: 0.64
Molecular Mechanism and Rationale
The hypothesis centers on the differential regulation of lipid droplet composition between A1 and A2 astrocyte phenotypes through the enzymatic balance of diacylglycerol O-acyltransferase 1 (DGAT1) and sterol O-acyltransferase 1 (SOAT1). DGAT1 catalyzes the final step in triglyceride synthesis by transferring acyl-CoA to diacylglycerol, while SOAT1 (also known as ACAT1) esterifies cholesterol to form cholesteryl esters. In A2 astrocytes, elevated SOAT1 acti
hid = 'h-7d4a24d3'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
Hypothesis 3: Mitochondrial Transfer Pathway Enhancement¶
Target genes: MIRO1 · Composite score: 0.639
Molecular Mechanism and Rationale
The mitochondrial transfer pathway enhancement strategy targets the fundamental cellular dysfunction underlying neurodegeneration by amplifying endogenous astrocyte-mediated repair mechanisms. Central to this approach is MIRO1 (Mitochondrial Rho GTPase 1), a critical regulator of mitochondrial transport that facilitates the movement of healthy mitochondria from neuroprotective A2 astrocytes to dysfunctional A1 astrocytes. MIRO1 functions as an adaptor prote
hid = 'h-969bd8e0'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
Hypothesis 4: Circadian Rhythm Entrainment of Reactive Astrocytes¶
Target genes: BMAL1 · Composite score: 0.635
Molecular Mechanism and Rationale
The circadian rhythm entrainment of reactive astrocytes represents a novel therapeutic paradigm leveraging the intrinsic temporal regulation of glial cell phenotypes through the master circadian transcription factor BMAL1 (Brain and Muscle ARNT-Like 1). BMAL1, forming a heterodimer with CLOCK (Circadian Locomotor Output Cycles Kaput), serves as the positive arm of the molecular circadian clock machinery, driving rhythmic gene expression through E-box-mediat
hid = 'h-5706bbd7'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
Hypothesis 5: Mechanosensitive Ion Channel Reprogramming¶
Target genes: PIEZO1 and KCNK2 · Composite score: 0.622
Molecular Mechanism and Rationale¶
The mechanosensitive ion channel reprogramming hypothesis centers on the pathological role of PIEZO1 channels in astrocyte phenotype switching during neurodegeneration. PIEZO1, a large trimeric mechanically-activated ion channel, consists of over 2,500 amino acids per subunit and forms a characteristic three-blade propeller structure. In healthy brain tissue, PIEZO1 channels in astrocytes respond to physiological mechanical stimuli by allowing calcium influx
hid = 'h-db6aa4b1'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
Hypothesis 6: Purinergic Signaling Polarization Control¶
Target genes: P2RY1 and P2RX7 · Composite score: 0.59
Molecular Mechanism and Rationale
The purinergic signaling pathway represents a fundamental regulatory system controlling astrocyte phenotypic polarization through the opposing actions of P2Y1 and P2X7 receptors. P2Y1 (P2RY1) is a Gq/G11-coupled metabotropic receptor that responds to ADP with high affinity (EC50 ~100 nM), triggering phospholipase C-β activation and subsequent IP3-mediated calcium release from endoplasmic reticulum stores. This generates sustained, oscillatory calcium waves
hid = 'h-0758b337'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
Hypothesis 7: Epigenetic Memory Erasure via TET2 Activation¶
Target genes: TET2 · Composite score: 0.585
Molecular Mechanism and Rationale
The fundamental basis of this therapeutic hypothesis centers on the epigenetic dysregulation that underlies astrocyte polarization in neurodegenerative diseases. Ten-eleven translocation methylcytosine dioxygenase 2 (TET2) serves as a critical epigenetic enzyme responsible for catalyzing the oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), initiating active DNA demethylation processes. In the context of neurodegeneration, astrocytes un
hid = 'h-d2722680'
papers = load(f'pubmed_{hid}')
if isinstance(papers, list) and papers:
lit = pd.DataFrame(papers)
cols = [c for c in ['year','journal','title','pmid'] if c in lit.columns]
if cols:
lit = lit[cols]
lit['title'] = lit['title'].str[:80]
if 'journal' in lit.columns:
lit['journal'] = lit['journal'].str[:30]
lit.sort_values('year', ascending=False, inplace=True)
display_df = lit
else:
display_df = pd.DataFrame(papers[:5])
else:
display_df = pd.DataFrame([{'note':'no PubMed results'}])
display_df
| note | |
|---|---|
| 0 | no PubMed results |
11. Knowledge graph edges (178 total)¶
edge_data = [{'source': 'P2RY1', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.66}, {'source': 'P2RY1', 'relation': 'interacts_with', 'target': 'P2RX7', 'strength': 0.66}, {'source': 'P2RX7', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.66}, {'source': 'P2RX7', 'relation': 'interacts_with', 'target': 'P2RY1', 'strength': 0.66}, {'source': 'PIEZO1', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.6}, {'source': 'PIEZO1', 'relation': 'interacts_with', 'target': 'KCNK2', 'strength': 0.6}, {'source': 'KCNK2', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.6}, {'source': 'KCNK2', 'relation': 'interacts_with', 'target': 'PIEZO1', 'strength': 0.6}, {'source': 'TET2', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.58}, {'source': 'P2RY1', 'relation': 'participates_in', 'target': 'Purinergic signaling', 'strength': 0.54}, {'source': 'P2RX7', 'relation': 'participates_in', 'target': 'Purinergic signaling', 'strength': 0.54}, {'source': 'DGAT1', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.53}, {'source': 'DGAT1', 'relation': 'interacts_with', 'target': 'SOAT1', 'strength': 0.53}, {'source': 'SOAT1', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.53}, {'source': 'SOAT1', 'relation': 'interacts_with', 'target': 'DGAT1', 'strength': 0.53}, {'source': 'BMAL1', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.52}, {'source': 'MIRO1', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.51}, {'source': 'TET2', 'relation': 'participates_in', 'target': 'Epigenetic regulation', 'strength': 0.51}, {'source': 'HK2', 'relation': 'associated_with', 'target': 'neurodegeneration', 'strength': 0.51}, {'source': 'h-0758b337', 'relation': 'targets', 'target': 'P2RY1 and P2RX7', 'strength': 0.5}, {'source': 'h-db6aa4b1', 'relation': 'targets', 'target': 'PIEZO1 and KCNK2', 'strength': 0.5}, {'source': 'h-7d4a24d3', 'relation': 'targets', 'target': 'DGAT1 and SOAT1', 'strength': 0.5}, {'source': 'PIEZO1', 'relation': 'participates_in', 'target': 'Astrocyte reactivity signaling', 'strength': 0.49}, {'source': 'KCNK2', 'relation': 'participates_in', 'target': 'Astrocyte reactivity signaling', 'strength': 0.49}, {'source': 'BMAL1', 'relation': 'participates_in', 'target': 'Circadian clock / BMAL1-CLOCK ', 'strength': 0.46}]
if edge_data:
pd.DataFrame(edge_data).head(25)
else:
print('No KG edge data available')
12. Caveats¶
This notebook uses real Forge tool calls cached from live APIs, but:
- Enrichment is against curated gene-set libraries, not genome-wide screens
- STRING/Reactome/HPA/MyGene reflect curated knowledge
- PubMed literature is search-relevance ranked, not systematic review
The cached evidence bundle is the minimum viable real-data analysis for this topic.