[Exchange] Enrich top 5 hypotheses with clinical trials data via ClinicalTrials.gov API
> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> F1 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.
ID: 1290fb13-d11
Priority: 88
Type: one_shot
Status: complete
Goal
Enrich the top 5 hypotheses (by composite score) with real clinical trials data fetched from the ClinicalTrials.gov v2 API. Each hypothesis gets structured trial data (NCT ID, title, status, phase, conditions, interventions, sponsor, enrollment, dates, description, URL) stored in the clinical_trials JSON column.
Acceptance Criteria
☑ Top 5 hypotheses have clinical_trials populated with real ClinicalTrials.gov data
☑ Each trial record includes nctId, title, status, phase, conditions, interventions, sponsor, enrollment, dates, URL
☑ Hypothesis pages render clinical trials sections correctly
☑ Work log updated with timestamped entry
Approach
Identify top 5 hypotheses by composite_score
Build tailored search queries for each hypothesis based on target gene, disease, and mechanism
Query ClinicalTrials.gov v2 API, deduplicate by NCT ID
Store structured JSON in the clinical_trials column
Verify hypothesis pages render the dataWork Log
2026-04-02 14:35 UTC — Slot 13
- Identified top 5 hypotheses by composite score:
1. h-9e9fee95 — Circadian Glymphatic Entrainment (score 0.697) → 4 trials
2. h-de0d4364 — Acid Sphingomyelinase Modulation (score 0.695) → 9 trials
3. h-2600483e — CYP46A1 Gene Therapy (score 0.693) → 3 trials
4. h-bdbd2120 — Gamma Entrainment Therapy (score 0.692) → 12 trials
5. h-9d29bfe5 — Membrane Cholesterol Modulators (score 0.680) → 18 trials
- Created enrich_clinical_trials.py script using ClinicalTrials.gov v2 API
- Used multiple tailored search queries per hypothesis for better coverage
- Ran enrichment with deduplication by NCT ID
- Verified all 5 hypothesis pages render "Clinical Trials (N)" sections correctly
- Result: Done — all top 5 hypotheses enriched with 3-18 clinical trials each (46 total unique trials)
2026-04-16 17:55 UTC — Reopened (Slot glm-5:60)
- Task reopened due to orphan branch — prior commit d59233e26 was on main but audit couldn't verify
- Verified prior work: old top 5 hypotheses still have clinical_trials data (4-13 trials each)
- Rankings shifted significantly since original run — current top 5 are different hypotheses
- Current top 5 by composite_score:
1. h-11ba42d0 — APOE4 Lipidation Enhancement (0.845) → already had 16 trials
2. SDA-2026-04-16-hyp-e5bf6e0d — Metabolic Reprogramming/Senescence (0.790) → enriched with 2 trials (SIRT1, PGC1A, NAMPT)
3. h-b2aeabb1 — PEA Endocannabinoid Therapy (0.780) → enriched with 11 trials (PEA, PPARA)
4. SDA-2026-04-02-gap-tau-prop-20260402003221-H001 — LRP1 Tau Uptake (0.725) → enriched with 19 trials (tau immunotherapy, vaccines, antibodies)
5. SDA-2026-04-16-hyp-daadc5c6 — SASP Modulation (0.710) → enriched with 6 trials (senolytics, NF-kB, senescence)
- Created enrich_clinical_trials_top5.py with tailored queries per hypothesis
- LRP1 hypothesis required broader queries (tau immunotherapy, anti-tau therapy) since no direct LRP1 trials exist
- Verified all 5 hypothesis pages render "Clinical Trials (N)" sections correctly via curl
- Result: Done — all current top 5 hypotheses enriched (54 total trials across top 5)
2026-04-16 18:20 UTC — Reopened again (Slot glm-5:60, attempt 2)
- Task reopened again due to orphan branch — prior commit 279265710 from attempt 1 didn't merge to main
- Verified all top 5 hypotheses still have clinical_trials data in DB (unchanged)
- All hypothesis pages render correctly (200 OK, "Clinical Trials" sections present)
- Committed enrichment script + spec update, pushed to remote
- Result: Done — commit 279265710 pushed, awaiting merge
2026-04-21 03:51 UTC — Codex slot 44
- Re-evaluated the reopened task against current PostgreSQL state after the SQLite retirement reset.
- Verified current top 5 hypotheses by
composite_score all have structured clinical_trials data with required fields populated.
- Found
/hypothesis/{id} was shadowed by an older lightweight route that did not include the clinical trials tab, while the richer clinical-trials-aware route was registered later on the same path.
- Moved the legacy lightweight handler to
/hypothesis-lite/{id} so canonical hypothesis pages render the Exchange clinical trials section.
- Live API process still showed pool-exhaustion 500s before restart, but the route-table fix is code-scoped and verifiable in-process.