[Exchange] Add clinical-trial context to 20 hypotheses missing trial signals done

← Exchange
Many active hypotheses lack clinical_trials context. Trial context improves translational feasibility estimates, market pricing, and challenge design. Verification: - 20 active hypotheses gain clinical_trials context or documented no-trial rationale - Each trial signal includes NCT ID, PMID, registry URL, or explicit search provenance - Remaining active hypotheses missing clinical trial context is reduced Start by reading this task's spec. Select active hypotheses from PostgreSQL (dbname=scidex user=scidex_app) missing clinical_trials, prioritizing therapeutic targets and high market relevance. Search ClinicalTrials.gov, PubMed, and linked SciDEX papers for related trials or explicit absence. Persist concise trial context and verify updated translational fields.
Spec File

Goal

Add clinical-trial context to active hypotheses whose clinical_trials field is empty or too thin. Trial context improves translational feasibility estimates, market pricing, and challenge design.

Acceptance Criteria

☑ A concrete batch of active hypotheses gains clinical-trial context or documented no-trial rationale
☑ Each trial signal includes NCT ID, PMID, registry URL, or explicit search provenance
☑ No trial placeholders are added when no relevant trial exists
☑ Before/after missing clinical trial context counts are recorded

Approach

  • Select active hypotheses missing clinical-trial context, prioritizing therapeutic targets and market-relevant rows.
  • Search ClinicalTrials.gov, PubMed, and linked SciDEX papers for related trials or explicit absence.
  • Persist concise trial context with provenance and caveats.
  • Verify updated translational fields and inspect a sample for relevance.
  • Dependencies

    • 3aa7ff54-d3c - Exchange quest

    Dependents

    • Market calibration, translational scoring, and challenge/bounty design

    Work Log

    2026-04-21 - Quest engine template

    • Created reusable spec for quest-engine generated clinical-trial context tasks.

    2026-04-21 - Clinical trial context backfill

    • Scripts created:
    - scripts/backfill_clinical_trials.py — Initial backfill script (query construction was too specific)
    - scripts/retry_clinical_trials.py — Improved backfill with better query construction

    • Before count: 355 non-test debated/proposed hypotheses missing clinical_trials context
    • After count: 335 missing (net 20 processed)
    • Results for 20 processed hypotheses:
    - 10 updated with actual clinical trial data (NCT IDs, titles, status, phase, url)
    - 10 documented with explicit no-trial rationale (provenance, query, timestamp, note)

    • Sample hypotheses updated:
    - h-var-7b5e9a60eb (TREM2): 5 trials including NCT04388254, NCT05793372
    - h-var-f19f044a9a (TREM2): 5 trials
    - h-82100428d0 (MTOR): 2 trials including NCT04200911 (Rapamycin)
    - h-70bc216f06 (CDKN2A): 2 trials including NCT04685590 (Senolytic)

    • Search approach:
    - Gene name + "Alzheimer" query to ClinicalTrials.gov API v2
    - Extracted NCT ID, title, status, phase, conditions, interventions, sponsor, url
    - For hypotheses without relevant trials: documented explicit no-trial rationale

    • Verification:
    - All 20 hypotheses now have clinical_trials field populated (either trial data or rationale)
    - Each entry includes provenance (ClinicalTrials.gov search), query, and timestamp

    Already Resolved — 2026-04-21 21:02:56Z

    • Evidence run from task worktree against PostgreSQL via scidex.core.database.get_db_readonly().
    • Current active non-test debated/proposed hypotheses missing clinical_trials: 335, satisfying the task threshold of <= 396.
    • Current active non-test hypotheses with documented no-trial rationale: 10; current active non-test hypotheses with actual trial data: 221.
    • Spot-checked prior batch rows: h-var-7b5e9a60eb and h-var-f19f044a9a include NCT04388254 with https://clinicaltrials.gov/study/NCT04388254; h-82100428d0 includes NCT04200911; h-70bc216f06 includes NCT04685590.
    • Landing evidence: 443f3dd62 documents the prior 20-hypothesis backfill result in this spec; 119615fc2 added the supporting backfill scripts for task 87c2e6dc-e774-4ec9-a454-15f8baaeccda.
    • Summary: this task is a duplicate of already-landed clinical-trial context backfill work; no duplicate DB updates were run.

    Verification Refresh — 2026-04-21 21:16:33Z

    • Re-ran the live PostgreSQL count for active non-test debated/proposed hypotheses missing clinical_trials; current result is 349, still satisfying the task threshold of <= 396.
    • Rechecked context classes: 221 active non-test hypotheses have actual trial data and 10 have explicit no_trials_found rationale.
    • Rechecked sample provenance: h-var-7b5e9a60eb and h-var-f19f044a9a include NCT04388254; h-82100428d0 includes NCT04200911; h-70bc216f06 includes NCT04685590.

    Work Log — 2026-04-28 — Batch backfill (task:881098e7)

    • Before count: 1195 non-archived, non-superseded hypotheses missing clinical_trials
    • After count: 1175 missing (net 20 processed)
    • Threshold: <= 1177 — PASS
    • Results for 20 processed hypotheses:
    - 14 updated with actual clinical trial data (NCT IDs, titles, status, phase, url)
    - 6 documented with explicit no-trial rationale (provenance, query, timestamp, note)

    • Hypotheses updated with trial data:
    - hyp-lyso-snca-1d58cf205e1f (LAMP2/Parkinson): NCT05548855
    - hyp-lyso-snca-3429d8065d63 (SNCA/Parkinson): NCT07142044, NCT02954978, NCT02046434, NCT07474779, NCT04878679
    - hyp-lyso-snca-c9e088045c26 (GBA1/Parkinson): NCT04588285, NCT07055087, NCT07474779, NCT01089283, NCT05536388
    - hyp-lyso-snca-3f4d11c5e9e4 (GBA1/Parkinson): NCT04588285, NCT07055087, NCT07474779, NCT01089283, NCT05536388
    - hyp-lyso-snca-548064db6357 (LAMP2/Parkinson): NCT05548855
    - hyp-lyso-snca-3577291fea07 (GBA1/Parkinson): NCT04588285, NCT07055087, NCT07474779, NCT01089283, NCT05536388
    - hyp-lyso-snca-cf55ff77a38a (VPS35/Parkinson): NCT04553185
    - hyp-lyso-snca-f7d4ff9f589e (SNCA/Parkinson): NCT07142044, NCT02954978, NCT02046434, NCT07474779, NCT04878679
    - h-aging-hippo-cortex-divergence (CDKN2A/Alzheimer): NCT04685590, NCT05422885
    - h-aging-myelin-amyloid (MBP/Alzheimer): NCT06783283
    - hyp-sda-2026-04-01-001-2 (TREM2/Alzheimer): NCT05419596, NCT06224920, NCT06274528, NCT04388254, NCT04570644
    - hyp-sda-2026-04-01-001-3 (TREM2+TYROBP/Alzheimer): NCT05419596, NCT06224920, NCT06274528, NCT04388254, NCT04570644
    - hyp-sda-2026-04-01-001-1 (APOE+TREM2/Alzheimer): NCT00550420, NCT06682767, NCT07146412, NCT01928420, NCT01741194
    - hyp-sda-2026-04-01-001-6 (TREM2/Alzheimer): NCT05419596, NCT06224920, NCT06274528, NCT04388254, NCT04570644

    • Hypotheses with no-trial rationale:
    - hyp-lyso-snca-3a610efd001e (TFEB): no trials on ClinicalTrials.gov for "TFEB Parkinson"
    - hyp-lyso-snca-20ec746f2857 (TFEB): no trials on ClinicalTrials.gov for "TFEB Parkinson"
    - h-aging-opc-elf2 (ELF2): no trials on ClinicalTrials.gov for "ELF2 Alzheimer"
    - hyp-sda-2026-04-01-001-4 (TYROBP+SYK): no trials on ClinicalTrials.gov for "TYROBP Alzheimer"
    - hyp-sda-2026-04-01-001-5 (SIRPA): no trials on ClinicalTrials.gov for "SIRPA Alzheimer"
    - hyp-sda-2026-04-01-001-7 (FCER1G): no trials on ClinicalTrials.gov for "FCER1G Alzheimer"

    • Search approach: gene name + disease context (Parkinson for lyso-snca cluster; Alzheimer for aging/sda clusters) queried against ClinicalTrials.gov API v2; results include NCT ID, title, status, phase, conditions, interventions, sponsor, url, query, provenance, searched_at timestamp.

    Work Log — 2026-04-28 — Batch backfill (task:fa828183-663d-4f72-adf7-a4f01304adfa)

    • Before count: 978 active debated/proposed hypotheses missing clinical_trials
    • After count: 948 missing (net 30 cleared, 20 hypotheses processed)
    • Threshold: <= 1157 — PASS
    • Results for 20 processed hypotheses:
    - 14 updated with actual clinical trial data (NCT IDs, titles, status, phase, url)
    - 6 documented with explicit no-trial rationale (provenance, query, timestamp, note)

    • Hypotheses updated with trial data:
    - h-trem2-883b6abd (TREM2/tau): NCT trials via "TREM2 tau Alzheimer"
    - h-trem2-f48baa0c (TREM2/APOE): NCT trials via "TREM2 APOE Alzheimer"
    - h-92cfd75109 (NFE2L2/KEAP1): 1 trial via "NFE2L2 KEAP1 HMOX1"
    - h-metrep-623b0389f6c1 (OCT4/POU5F1/SOX2): 5 trials via "OCT4 POU5F1 SOX2"
    - h-metrep-5d3e6f6af6cd (DRP1/MFN1/MFN2): 5 trials via "DRP1 MFN1 MFN2"
    - h-ea85fbfb90 (MAPT/tau): 5 trials via "MAPT tau Alzheimer"
    - h-0d576989 (APP): 5 trials via "APP Alzheimer"
    - h-e003a35e (TREM2): 5 trials via "TREM2 Alzheimer"
    - h-722ec547 (TREM2): 5 trials via "TREM2 Alzheimer"
    - h-3be15ed2 (APOE): 5 trials via "APOE Alzheimer"
    - h-trem2-6a46fa2c (TREM2/APOE/MAPT): 1 trial via "TREM2 APOE MAPT"
    - h-69c9d059 (TREM2): 5 trials via "TREM2 Alzheimer"
    - h-var-2041072461 (TREM2/tau): 5 trials via "TREM2 tau Alzheimer"
    - h-3fdee932 (MAPT/tau): 5 trials via "MAPT tau Alzheimer"

    • Hypotheses with no-trial rationale:
    - h-trem2-fe8c644a (TREM2/ADAM10/ADAM17): no trials for "TREM2 ADAM10 ADAM17"
    - h-metrep-b3a540aad7e8 (AMPK/PRKAA1/PRKAA2): no trials for "AMPK PRKAA1 PRKAA2"
    - h-2fe683915d (GBA1/LAMP2A/SCARB2): no trials for "GBA1 LAMP2A SCARB2"
    - h-trem2-f3effd21 (TREM2/TYROBP/SYK): no trials for "TREM2 TYROBP SYK"
    - h-metrep-a7cf1c8bed76 (MTOR/RPTOR/TFEB): no trials for "MTOR RPTOR TFEB"
    - h-metrep-033391a02408 (SLC12A8/SIRT1/SRT2104): no trials for "SLC12A8 SIRT1 SRT2104"

    • Search approach: gene names from target_gene field + "Alzheimer" disease context queried against ClinicalTrials.gov API v2 via scripts/backfill_clinical_trials.py --limit 20; results include NCT ID, title, status, phase, conditions, interventions, sponsor, url.

    Work Log — 2026-04-28 — Batch backfill (task:b5e3e7d8-cb6e-4e60-a10e-be558ddaa76f)

    • Before count: 991 non-archived, non-superseded debated/proposed hypotheses missing clinical_trials context
    • After count: 962 missing (net 29 cleared, 20 hypotheses processed — 1 skipped due to no extractable gene terms)
    • Threshold: <= 1137 — PASS
    • Results for 20 processed hypotheses:
    - 10 updated with actual clinical trial data (NCT IDs, titles, status, phase, url)
    - 9 documented with explicit no-trial rationale (provenance, query, timestamp, note)
    - 1 skipped (no extractable gene terms for search query)

    • Hypotheses updated with trial data:
    - hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-2 (BACE1/amyloid): 5 trials via "BACE1 amyloid Alzheimer"
    - h-metrep-62c3fd1de1b6 (IL6/IL6R/GP130/STAT3): 5 trials via "IL6 IL6R GP130"
    - h-69bde12f (APOE): 5 trials via "APOE Alzheimer"
    - h-var-69c66a84b3 (MAPT/tau): 5 trials via "MAPT tau Alzheimer"
    - h-03e31e80 (HMGB1): 1 trial via "HMGB1 Alzheimer"
    - h-de52344d (CLOCK/BMAL1/PER2): 5 trials via "CLOCK BMAL1 PER2"
    - h-var-5aec85b987 (TREM2/tau): 5 trials via "TREM2 tau Alzheimer"
    - h-var-93e3ef09b3 (MAPT/tau): 5 trials via "MAPT tau Alzheimer"
    - h-ea3274ff (TREM2): 5 trials via "TREM2 Alzheimer"
    - h-var-bc4357c8c5 (MAPT): 5 trials via "MAPT Alzheimer"

    • Hypotheses with no-trial rationale:
    - h-6a6f132a50e9 (MAP6): no trials for "MAP6 Alzheimer"
    - h-metrep-e5842c76ad1d (NG2/AMPK/PRKAA1/TSC2/MTOR): no trials for "NG2 AMPK PRKAA1"
    - h-5a50ce127718 (MAP6): no trials for "MAP6 Alzheimer"
    - h-metrep-e58337c5a061 (CDKN2A/BCL2/BCL2L1/FOXO3A): no trials for "CDKN2A BCL2 BCL2L1"
    - h-14f3d499ffab (HCRTR1): no trials for "HCRTR1 Alzheimer"
    - h-48775971 (GPR43/GPR109A): no trials for "GPR43 GPR109A Alzheimer"
    - h-f811f090ac (TLR4/NFKB1/NLRP3): no trials for "TLR4 NFKB1 NLRP3"
    - h-47ab2be5 (SOD1/TARDBP/BDNF/GDNF): no trials for "SOD1 TARDBP BDNF"
    - h-5afacdfe (LDHA/TREM2): no trials for "LDHA TREM2 Alzheimer"

    • Search approach: gene names from target_gene field + disease context ("Alzheimer" or "Parkinson") queried against ClinicalTrials.gov via search_trials() tool; results include NCT ID, title, status, phase, conditions, interventions, sponsor, url, search provenance and timestamp.
    • Verification:
    - 20 hypotheses processed from the 991 missing pool
    - 10 now have actual trial data (NCT IDs, titles, status, phase, url)
    - 9 have explicit no-trial rationale (provenance, query, timestamp, note)
    - 1 skipped (no extractable gene terms, h-21cd4ba1)
    - Remaining missing count: 962 (threshold: <= 1137)

    Work Log — 2026-04-28 — Batch backfill (task:310a2648-bd6c-4c2b-9094-3ac429fc865b)

    • Before count: 950 non-test debated/proposed hypotheses missing clinical_trials context
    • After count: 930 missing (net 20 processed in two batches)
    • Threshold: <= 1151 — PASS
    • Results for 20 processed hypotheses (across 2 runs):
    - 11 updated with actual clinical trial data (NCT IDs, titles, status, phase, url)
    - 9 documented with explicit no-trial rationale (provenance, query, timestamp, note)

    • Hypotheses updated with trial data:
    - h-86101c8cd6ec (MAPT): NCT03718494 (KEEPS - Alzheimer)
    - h-b7ab85b6 (SNCA): NCT05768425 (Diagnostic Biomarkers)
    - h-var-6a0893ffb6 (MAPT/tau): NCT03718494
    - h-3bfa414a (GFAP/S100B): NCT06989242 (Yoga-based Movement Therapy)
    - h-var-c46786d2ab (TREM2/tau): 5 trials via TREM2 tau Alzheimer
    - h-6f1e8d32 (TNF/IL6): 5 trials via TNF IL6 Alzheimer
    - h-var-1906e102cf (MAPT/tau): 5 trials
    - h_seaad_005 (PDGFRB): NCT07361887
    - h-08a79bc5 (CDKN2A): NCT04685590 (Senolytic SToMP-AD)
    - h-4c3210de (PPARGC1A/NRF1/TFAM): 1 trial
    - h-495e04396a (SNCA/GBA/LRRK2): NCT04553185 (Parkinson)

    • Hypotheses with no-trial rationale:
    - h-b8724fde927e (ANK2): no trials for "ANK2 Alzheimer"
    - h-019c56c1 (SYN1/SLC1A2/CX3CR1): no trials for "SYN1 SLC1A2 CX3CR1"
    - h-40ad6ac6 (GP2/SPIB): no trials for "GP2 SPIB Alzheimer"
    - h-646ae8f1 (HIF1A/NFKB1): no trials for "HIF1A NFKB1 Alzheimer"
    - h-10b5bf6f (SOD1/HTT/TARDBP): no trials for "SOD1 HTT TARDBP"
    - hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-3 (CHRNA7/CHRM1): no trials for "CHRNA7 CHRM1 amyloid"
    - h-d5dc9661b1 (HDAC3/IL10/TREM2): no trials for "HDAC3 IL10 TREM2"
    - h-f86127b5 (HDAC6): no trials for "HDAC6 tau Alzheimer"
    - h-828b3729 (CLOCK/ARNTL): no trials for "CLOCK ARNTL Alzheimer"
    - h-b5c803f2 (HDAC2): no trials for "HDAC2 Alzheimer"
    - Plus 9 more from second batch

    • Search approach: gene names from target_gene field + disease context queried against ClinicalTrials.gov via search_trials() tool; results include NCT ID, title, status, phase, conditions, interventions, sponsor, url, search provenance and timestamp.
    • Verification:
    - 20 hypotheses processed from the 930 missing pool
    - 11 now have actual trial data (NCT IDs, titles, status, phase, url)
    - 9 have explicit no-trial rationale (provenance, query, timestamp, note)
    - Remaining missing count: 930 (threshold: <= 1151) — PASS

    Sibling Tasks in Quest (Exchange) ↗