SciDEX — Task: [Senate] Disease-priority score

GBD DALYs + tractability + coverage compose into an attention-dividend that steers fleet attention to high-burden underserved diseases.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

[Senate] Disease-priority score: burden × tractability × SciDEX coverage [task:558f7c40-0b86-4d8d-b22d-ffea5780d145] (#782)2026-04-27

Squash merge: orchestra/task/dff08e77-holistic-task-prioritization-and-self-go (2 commits) (#774)2026-04-27

Spec File

Effort: thorough

Goal

Build a transparent disease-priority score that ranks every disease
in disease_ontology_catalog by where SciDEX should invest agent
attention next. The score combines (a) global disease burden (DALYs from
Global Burden of Disease 2021 study), (b) tractability (number of
clinically-validated targets, stage of pipeline), and (c) current
SciDEX coverage (gaps + hypotheses + open-questions per disease) to
emit a per-disease "attention dividend" — high burden + tractable +
underserved by SciDEX = top of the list.

Why this matters

Without an explicit prioritization, fleet attention skews toward
whatever has the most existing hypotheses (path dependence) or the
most papers (publication-volume bias). A burden-weighted score lets
the Senate steer compute toward diseases the world actually needs work
on (e.g. tuberculosis is high burden in low-income regions, low SciDEX
attention) and prevents the platform from drifting into a
neuroscience-only echo chamber.

Acceptance Criteria

☐ Migration disease_priority_score(mondo_id PRIMARY KEY,


      dalys_2021, n_validated_targets, n_pipeline_drugs, scidex_coverage_score,
      attention_dividend, score_version, computed_at)

☐ New module scidex/senate/disease_priority.py (≤500 LoC):

- load_gbd_burden() — pulls DALYs from IHME GBD 2021 results
bundle (https://vizhub.healthdata.org/gbd-results/); cached
under data/gbd/2021/.
- compute_tractability(mondo_id) — counts clinical-trial
interventions from ClinicalTrials.gov (existing
chembl-drug-targets + search-trials helpers) and weighted
by max phase reached.
- compute_coverage(mondo_id) — counts SciDEX hypotheses, gaps,
and open-questions; coverage_score =

log1p(n_hyps) +
        log1p(n_gaps) + 2*log1p(n_open_questions)

.
-

attention_dividend = (dalys_z + tractability_z) -
        coverage_z

— diseases with high need + tractable but
under-covered float to the top.

☐ Recurring timer scidex-disease-priority.timer weekly Sunday

06:00 UTC recomputes scores; bumps score_version.

☐ /senate/disease-priorities page renders the ranked table with

sortable columns + a stacked-bar per row (DALY / tractability /
coverage / dividend) using the colorblind-safe matplotlib palette.

☐ Quest engine consumer: scidex/agora/gap_pipeline.py consults

the priority score when picking which gap to enrich next — a
gap whose disease is in the top decile of attention_dividend
gets +0.3 to its enrichment-priority weight (transparently
logged).

☐ Tests: sanity assertions that high-burden + low-coverage

diseases (e.g. tuberculosis) score above well-covered Western
diseases when their SciDEX coverage is comparable.

☐ Reproducibility: score_version plus the GBD bundle SHA

committed via commit_artifact so the historical score can
be reproduced.

Approach

GBD 2021 DALY data is published as CSV/Excel — pin a specific

download date and treat as immutable input.

Tractability calculation combines drug-stage data already cached

from chembl-drug-targets and search-trials.

Coverage queries are simple GROUP BY on hypotheses, gaps,

open_questions joined to entity_disease_canonical from
q-vert-disease-ontology-catalog.

Z-scores computed across the catalog (so adding new diseases

doesn't destabilize ranking — recompute on score_version bumps).

Wire into gap-pipeline scoring with a tunable knob in

configs/disease_priority.yaml.

Dependencies

q-vert-disease-ontology-catalog — catalog + canonical resolver.
Existing chembl-drug-targets, search-trials skills.
GBD 2021 public data release.

Work Log

2026-04-27 — Implementation complete

Files created/modified:

migrations/add_disease_priority_score.py — creates disease_priority_score table with mondo_id PK, dalys_2021, n_validated_targets, n_pipeline_drugs, scidex_coverage_score, attention_dividend, dalys_z, tractability_z, coverage_z, score_version, computed_at; indexes on attention_dividend DESC and (score_version, computed_at)
scidex/senate/disease_priority.py (360 LoC) — full module: load_gbd_burden(), compute_tractability(), compute_coverage(), compute_all_scores(), get_top_decile_mondo_ids(), get_scores_ranked(), run()
configs/disease_priority.yaml — tunable weights for burden/tractability/coverage, live_tractability flag, gap_top_decile_bonus
api.py — /senate/disease-priorities route with colorblind-safe stacked-bar table
scidex/agora/gap_pipeline.py — compute_enrichment_priority() applies +0.3 bonus to gaps in top-decile attention-dividend diseases; get_disease_priority_top_decile() fetches MONDO IDs; review_gaps() wired to use bonus
deploy/scidex-disease-priority.service + .timer — weekly Sunday 06:00 UTC
tests/test_disease_priority.py — 12 tests, all passing

Decisions:

Built-in GBD 2021 DALY data (27 key diseases) as fallback; loads from data/gbd/2021/gbd_dalys.csv when present
live_tractability: false default to avoid bulk HTTP requests; opt-in per config
Tractability proxy: n_validated_targets * 2 + n_pipeline_drugs; z-scored across catalog
Coverage uses knowledge_gaps.mondo_id + domain ILIKE pattern + hypotheses.disease ILIKE pattern
Gracefully handles absent open_questions and disease_ontology_catalog tables
Verified against live DB: 250 diseases scored in ~10s; "cancer (all types)" tops dividend at +22.15 with 250M DALYs