Effort: thorough
Build an immune-receptor repertoire pipeline for the immunology
vertical: ingest TCR/BCR sequencing FASTQ (or a precomputed AIRR
TSV), call clonotypes with MiXCR, compute repertoire diversity
(Shannon, Gini, Hill), match clonotypes against IEDB epitopes via the
new iedb_epitopes tool, and emit a clonotype-to-epitope linkage
artifact a debate can cite. Closes the immunology-vertical's biggest
data gap: SciDEX has no way to argue from actual repertoire data today.
Immunology hypotheses ("expanded autoreactive TCRs drive RA flares",
"hospital-acquired SARS-CoV-2 strains evade convalescent BCR
responses") need real repertoire evidence to be debate-grade. Without
a pipeline, the immunology Theorist has nothing to ground claims on
beyond text reviews. This pipeline absorbs publicly available AIRR
datasets (10X Genomics, ImmuneSpace, AIRR-DB) and turns them into
SciDEX artifacts the persona pack can argue from.
scidex/forge/immune_repertoire.py (≤700 LoC):ingest(source) — accepts FASTQ paths, an AIRR-format TSV,call_clonotypes_mixcr(fastqs) — invokes MiXCR via subprocess;diversity_metrics(clones) — computes Shannon entropy, Gini,link_to_epitopes(clones) — callstools.iedb_epitopes per clonotype CDR3; returns matchespipeline(source, chain='TRB') — composes; commits artifactdata/scidex-artifacts/immune_repertoire/<run_id>/repertoire_run(run_id PRIMARY KEY, source_kind,tools.py registers immune_repertoire_pipeline(source,@log_tool_call.
/artifacts/<id> renders a clonotype-frequency rank plotq-vert-vertical-personas-pack) consumes arepertoire_block when a debate's hypothesis names a diseasetests/test_immune_repertoire.py — synthetic AIRR tabledocs/setup/mixcr.md. Subprocess wrapper handles installed +python-Levenshtein (lightweight).iedb_epitopes fromq-vert-vertical-evidence-providers.
q-vert-vertical-personas-pack — immunology-expert consumer.data/scidex-artifacts/ submodule.