[Senate] Disease-priority score - burden x tractability x SciDEX coverage done

← Senate
GBD DALYs + tractability + coverage compose into an attention-dividend that steers fleet attention to high-burden underserved diseases.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (3)

[Senate] Disease-priority score: burden × tractability × SciDEX coverage [task:558f7c40-0b86-4d8d-b22d-ffea5780d145] (#782)2026-04-27
Squash merge: orchestra/task/dff08e77-holistic-task-prioritization-and-self-go (2 commits) (#774)2026-04-27
[Senate] Prioritization run 71: 8 priority adjustments — safety/infra up, UI social down [task:dff08e77-5e49-4da5-a9ef-fbfe7de3dc17]2026-04-27
Spec File

Effort: thorough

Goal

Build a transparent disease-priority score that ranks every disease
in disease_ontology_catalog by where SciDEX should invest agent
attention next. The score combines (a) global disease burden (DALYs from
Global Burden of Disease 2021 study), (b) tractability (number of
clinically-validated targets, stage of pipeline), and (c) current
SciDEX coverage (gaps + hypotheses + open-questions per disease) to
emit a per-disease "attention dividend" — high burden + tractable +
underserved by SciDEX = top of the list.

Why this matters

Without an explicit prioritization, fleet attention skews toward
whatever has the most existing hypotheses (path dependence) or the
most papers (publication-volume bias). A burden-weighted score lets
the Senate steer compute toward diseases the world actually needs work
on (e.g. tuberculosis is high burden in low-income regions, low SciDEX
attention) and prevents the platform from drifting into a
neuroscience-only echo chamber.

Acceptance Criteria

☐ Migration disease_priority_score(mondo_id PRIMARY KEY,
dalys_2021, n_validated_targets, n_pipeline_drugs, scidex_coverage_score,
attention_dividend, score_version, computed_at)
.
☐ New module scidex/senate/disease_priority.py (≤500 LoC):
- load_gbd_burden() — pulls DALYs from IHME GBD 2021 results
bundle (https://vizhub.healthdata.org/gbd-results/); cached
under data/gbd/2021/.
- compute_tractability(mondo_id) — counts clinical-trial
interventions from ClinicalTrials.gov (existing
chembl-drug-targets + search-trials helpers) and weighted
by max phase reached.
- compute_coverage(mondo_id) — counts SciDEX hypotheses, gaps,
and open-questions; coverage_score = log1p(n_hyps) +
log1p(n_gaps) + 2*log1p(n_open_questions)
.
- attention_dividend = (dalys_z + tractability_z) -
coverage_z
— diseases with high need + tractable but
under-covered float to the top.
☐ Recurring timer scidex-disease-priority.timer weekly Sunday
06:00 UTC recomputes scores; bumps score_version.
/senate/disease-priorities page renders the ranked table with
sortable columns + a stacked-bar per row (DALY / tractability /
coverage / dividend) using the colorblind-safe matplotlib palette.
☐ Quest engine consumer: scidex/agora/gap_pipeline.py consults
the priority score when picking which gap to enrich next — a
gap whose disease is in the top decile of attention_dividend
gets +0.3 to its enrichment-priority weight (transparently
logged).
☐ Tests: sanity assertions that high-burden + low-coverage
diseases (e.g. tuberculosis) score above well-covered Western
diseases when their SciDEX coverage is comparable.
☐ Reproducibility: score_version plus the GBD bundle SHA
committed via commit_artifact so the historical score can
be reproduced.

Approach

  • GBD 2021 DALY data is published as CSV/Excel — pin a specific
  • download date and treat as immutable input.
  • Tractability calculation combines drug-stage data already cached
  • from chembl-drug-targets and search-trials.
  • Coverage queries are simple GROUP BY on hypotheses, gaps,
  • open_questions joined to entity_disease_canonical from
    q-vert-disease-ontology-catalog.
  • Z-scores computed across the catalog (so adding new diseases
  • doesn't destabilize ranking — recompute on score_version bumps).
  • Wire into gap-pipeline scoring with a tunable knob in
  • configs/disease_priority.yaml.

    Dependencies

    • q-vert-disease-ontology-catalog — catalog + canonical resolver.
    • Existing chembl-drug-targets, search-trials skills.
    • GBD 2021 public data release.

    Work Log

    2026-04-27 — Implementation complete

    Files created/modified:

    • migrations/add_disease_priority_score.py — creates disease_priority_score table with mondo_id PK, dalys_2021, n_validated_targets, n_pipeline_drugs, scidex_coverage_score, attention_dividend, dalys_z, tractability_z, coverage_z, score_version, computed_at; indexes on attention_dividend DESC and (score_version, computed_at)
    • scidex/senate/disease_priority.py (360 LoC) — full module: load_gbd_burden(), compute_tractability(), compute_coverage(), compute_all_scores(), get_top_decile_mondo_ids(), get_scores_ranked(), run()
    • configs/disease_priority.yaml — tunable weights for burden/tractability/coverage, live_tractability flag, gap_top_decile_bonus
    • api.py/senate/disease-priorities route with colorblind-safe stacked-bar table
    • scidex/agora/gap_pipeline.pycompute_enrichment_priority() applies +0.3 bonus to gaps in top-decile attention-dividend diseases; get_disease_priority_top_decile() fetches MONDO IDs; review_gaps() wired to use bonus
    • deploy/scidex-disease-priority.service + .timer — weekly Sunday 06:00 UTC
    • tests/test_disease_priority.py — 12 tests, all passing
    Decisions:
    • Built-in GBD 2021 DALY data (27 key diseases) as fallback; loads from data/gbd/2021/gbd_dalys.csv when present
    • live_tractability: false default to avoid bulk HTTP requests; opt-in per config
    • Tractability proxy: n_validated_targets * 2 + n_pipeline_drugs; z-scored across catalog
    • Coverage uses knowledge_gaps.mondo_id + domain ILIKE pattern + hypotheses.disease ILIKE pattern
    • Gracefully handles absent open_questions and disease_ontology_catalog tables
    • Verified against live DB: 250 diseases scored in ~10s; "cancer (all types)" tops dividend at +22.15 with 250M DALYs

    Sibling Tasks in Quest (Senate) ↗