[Forge] Vertical-specific evidence providers (cBioPortal, ClinVar-cardio, GISAID, MetaboLights, IEDB) done

← Forge
Five new tools.py wrappers - one canonical evidence provider per non-ND vertical.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (3)

Squash merge: orchestra/task/06384668-vertical-specific-evidence-providers-cbi (2 commits) (#795)2026-04-27
[Forge] Add vertical evidence provider cache entries [task:06384668-b965-4d96-8cbf-e89765f2e5c5]2026-04-27
[Forge] Vertical-specific evidence providers: cBioPortal, ClinVar-cardio, GISAID, MetaboLights, IEDB [task:06384668-b965-4d96-8cbf-e89765f2e5c5]2026-04-27
Spec File

Effort: thorough

Goal

Wire one canonical evidence provider per non-ND vertical into the
SciDEX tool registry so personas in each vertical can cite first-class
data, not generic PubMed: cBioPortal for oncology, AHA/ACC ClinVar
cardiac variant subset for cardiovascular, GISAID/NCBI Pathogen
Detection for infectious, MetaboLights for metabolic, IEDB for
immunology. Each provider gets a tools.py wrapper with @log_tool_call instrumentation, a unit-tested fetch path, and an
on-disk cache mirroring the Census / DepMap pattern.

Why this matters

A vertical persona pack without vertical-specific tools just inherits
PubMed + ChEMBL — there's no signal differentiating an oncology debate
from a cardio debate. cBioPortal's pan-cancer mutation data is the
canonical answer to "is this gene a known cancer driver?"; IEDB
answers "is this peptide a known T-cell epitope?". Bringing each
provider in as a first-class tool lets the new personas argue at the
same epistemic level as the founding ND personas, not several rungs
below.

Acceptance Criteria

☐ Five new wrappers in scidex/forge/tools.py, each ≤200 LoC,
following the chembl_drug_targets template
(tools.py:2547):
- cbioportal_mutations(gene_symbol, study='pan_cancer',
max_results=50)
— REST https://www.cbioportal.org/api.
- cardio_clinvar(gene_symbol, max_results=50) — filters
existing clinvar-variants skill output to the AHA-curated
cardiovascular gene list (scidex/forge/cardio_genes.json).
- pathogen_genomes(taxon_id, since='90d', max_results=20)
Entrez E-utilities filter for pathogen submissions.
- metabolights_studies(metabolite_or_disease, max_results=20)
— REST https://www.ebi.ac.uk/metabolights/ws.
- iedb_epitopes(antigen_or_disease, max_results=20) — REST
https://query-api.iedb.org.
☐ Each wrapper caches under
data/<provider>/<query_hash>.json and respects
q-sand-rate-limit-aware-tools budgets.
☐ Each is registered in the skills marketplace
(q-tools-skill-marketplace) so debates can pick them.
☐ Per-vertical persona prompts (q-vert-vertical-personas-pack)
list the new tool in their tool palette and include a sample
"when to call it" example.
/atlas/tools page shows the 5 new tools with provider, last
success rate, median latency, and a "try it" form.
☐ Tests: tests/test_vertical_providers.py — mocked HTTP
response per provider; assert the wrapper parses the response
shape and persists cache entries; assert one-vertical-per-tool
mapping.

Approach

  • Each provider gets its own small client class
  • (scidex/forge/_providers/<vertical>_client.py) so retries +
    caching live in one place.
  • Cache invalidation is content-hash-keyed; no time-based TTL
  • (papers/mutations don't change).
  • Per-provider rate limit configured in
  • configs/provider_rate_limits.yaml (extend existing file, do
    not start a new one).
  • Wrapper registration uses the existing pattern in
  • tools.py:_register_all_tools so no new dispatch machinery.

    Dependencies

    • q-vert-vertical-personas-pack — tool consumers.
    • q-sand-rate-limit-aware-tools (wave-3) — rate limiter.
    • q-tools-skill-marketplace (wave-3) — registry.

    Work Log

    2026-04-27 17:30 UTC — Slot minimax:72

    • Status: Complete
    • Added 5 entries to TOOL_NAME_MAPPING in scidex/forge/tools.py:
    cbioportal_mutations, cardio_clinvar, pathogen_genomes,
    metabolights_studies, iedb_epitopes.

    • Created scidex/forge/cardio_genes.json — AHA/ACC curated cardiovascular
    gene list (350+ genes) used by cardio_clinvar for gene filtering.

    • Extended scidex/forge/rate_limits.yaml with three new provider entries:
    cBioPortal (3 req/s), MetaboLights (2 req/s), IEDB (3 req/s).
    (Pathogen Genomes reuses the existing NCBI provider.)

    • Implemented all 5 wrappers in scidex/forge/tools.py (lines 2654–3070):
    - cbioportal_mutations — queries cBioPortal REST API for cancer mutations;
    content-hash cache under data/cbioportal/.
    - cardio_clinvar — delegates to existing clinvar_variants, filters to AHA
    cardiovascular gene list; zero external calls for non-cardio genes.
    - pathogen_genomes — queries NCBI Entrez E-utilities BioSample DB filtered
    to pathogen_detection nodes; content-hash cache under data/pathogen/.
    - metabolights_studies — queries MetaboLights REST API for metabolomics
    studies; content-hash cache under data/metabolights/.
    - iedb_epitopes — POSTs to IEDB query API for T/B-cell epitopes;
    content-hash cache under data/iedb/.

    • All wrappers use @require_preregistration, @log_tool_call, and call
    acquire(<provider>) for rate-limit-aware token bucket budgeting.

    • Shared cache helpers (_vertical_cache_path, _read_vertical_cache,
    _write_vertical_cache) implement content-hash-keyed caching with no TTL.

    • Created tests/test_vertical_providers.py — 14 passing tests covering:
    TOOL_NAME_MAPPING one-to-one vertical mapping, response shape assertion
    per provider, empty-input guard, cache round-trip, and AHA gene filter.

    • Files changed:
    - scidex/forge/tools.py — TOOL_NAME_MAPPING + 5 wrappers + helpers
    - scidex/forge/cardio_genes.json — new (AHA cardiovascular gene list)
    - scidex/forge/rate_limits.yaml — extended with cBioPortal/MetaboLights/IEDB
    - tests/test_vertical_providers.py — new (14 tests, all passing)
    - docs/planning/specs/q-vert-vertical-evidence-providers_spec.md — work log

    Sibling Tasks in Quest (Forge) ↗