Effort: thorough
Today the disease side of the SciDEX KG is anchored on neurodegeneration —
hypotheses.disease, wiki_entities, and the disease-landing dashboard
all assume an ND vocabulary. Build a **multi-vertical disease ontology
catalog** that imports MONDO, DOID, and EFO IDs for the top entities in
five new verticals (oncology, cardiovascular, infectious, metabolic,
immunology), maps them to canonical entity rows, and exposes a single
canonical_disease(slug) resolver every cross-cutting feature can reuse.
Without this, every wave-4 vertical task otherwise reinvents disease ID
plumbing.
Five wave-4 specs (per-vertical landing pages, persona injection,
gap importers, cross-disease analogy engine, priority scoring) all need
the same answer to "is colorectal-cancer the same node as
MONDO:0005575?" Today there is no canonical resolver — disease
columns hold free text ("AD", "Alzheimer's disease", "alzheimers-disease",
"Alzheimer disease (G30)"). A central catalog with MONDO IDs collapses
the ambiguity once and lets every other vertical task ride on top.
disease_ontology_catalog(mondo_id PRIMARY KEY,vertical and onlower(label).
scidex/atlas/disease_ontology.py (≤500 LoC) with:import_from_mondo() — pulls MONDO OWL via OLS RESThttps://www.ebi.ac.uk/ols4/api/ontologies/mondo) and walksMONDO:0004992 cancer, MONDO:0005267MONDO:0005550 infectious, MONDO:0005066MONDO:0005046 immune system).canonical_disease(query: str) -> CanonicalDisease | None — fuzzyrapidfuzz againstsynonyms_json) returning the catalog row.vertical_for(mondo_id) and subtree(mondo_id) helpers.
scripts/backfill_canonical_disease.py walkshypotheses.disease, wiki_entities WHERE entity_type='disease',wiki_pages WHERE category='disease', callscanonical_disease(), and writes a newentity_disease_canonical(entity_id, mondo_id, confidence,
resolved_at) join table. Reports unresolved-rate per vertical.
scidex-disease-ontology-refresh.timer weekly on Mondays 04:00/api/disease/{mondo_id} JSON endpoint returns the catalog/atlas/diseases page lists the 250+ catalog rows grouped byn_papers and n_hypotheses.
/disease-landing/<slug> routeapi.py per q-synth-disease-landing) is updated to look thetests/test_disease_ontology.py — round-tripMONDO:0004975; "colorectal cancer" → MONDO:0005575;MONDO:0005148; unknown string returnsNone, not exception./ontologies/mondo/terms?obo_id=MONDO:...&size=200data/disease_ontology/mondo/v<version>/.
oboSynonym annotation; ICD/OMIM/MeSHdatabaseCrossReferences.
rapidfuzz.process.extractOne with score cutoff 85./api/disease/{mondo_id} payload is fast.
scidex/atlas/canonical_entity_links.py so wave-4 personas andq-synth-disease-landing — landing route looks up via the new resolver.wiki_entities, canonical_entity_links infrastructure.