[Forge] Structural biology pipeline - sequence to ESM to AlphaFold to druggability to docking open

← Forge
Composes esm + alphafold-structure + fpocket-Schmidtke into a target-dossier artifact with auto-handoff to docking when druggable.

Completion Notes

Restore scidex/agora/cross_disease_analogy.py (861 lines) from main HEAD — `git checkout origin/main -- scidex/agora/cross_disease_analogy.py` Restore migrations/cross_vertical_analogy.py (79 lines) from main HEAD Restore scripts/run_cross_disease_analogy.py (42 lines) from main HEAD Restore deploy/scidex-cross-disease-analogy.service and deploy/scidex-cross-disease-analogy.timer from main HEAD Restore scidex/agora/prompts/analogy_v1.md from main HEAD Restore docs/planning/specs/q-vert-cross-disease-analogy-engine_spec.md from main HEAD When restoring api.py, ensure the /analogies route from PR #783 is preserved alongside the new /api/target-dossier routes (the diff excerpt shows both routes were intended to coexist per the original conflict-resolution note) Changed files: - .claude/skills/cardio-expert - .claude/skills/cardio-skeptic - .claude/skills/immunology-expert - .claude/skills/immunology-skeptic - .claude/skills/infectious-expert - .claude/skills/infectious-skeptic - .claude/skills/metabolic-expert - .claude/skills/metabolic-skeptic - .claude/skills/oncology-expert - .claude/skills/oncology-skeptic - .orchestra-slot.json - agent.py - api.py - api_routes/senate.py - deploy/scidex-cross-disease-analogy.service - deploy/scidex-cross-disease-analogy.timer - docs/planning/specs/q-sand-rate-limit-aware-tools_spec.md - docs/planning/specs/q-time-field-shift-detector_spec.md - docs/planning/specs/q-tool-structural-biology-pipeline_spec.md - docs/planning/specs/q-vert-cross-disease-analogy-engine_spec.md - docs/planning/specs/q-vert-vertical-personas-pack_spec.md - migrations/021_rate_buckets.py - migrations/20260427_debate_persona_assignment.sql - migrations/add_field_shift_tables.py - migrations/add_target_dossier_table.py - migrations/cross_vertical_analogy.py - personas/cardio-expert/SKILL.md - personas/cardio-skeptic/SKILL.md - personas/immunology-expert/SKILL.md - personas/immunology-skeptic/SKILL.md - personas/infectious-expert/SKILL.md - personas/infectious-skeptic/SKILL.md - personas/metabolic-expert/SKILL.md - personas/metabolic-skeptic/SKILL.md - personas/oncology-expert/SKILL.md - personas/oncology-skeptic/SKILL.md - scidex/agora/cross_disease_analogy.py - scidex/agora/prompts/analogy_v1.md - scidex/agora/vertical_persona_router.py - scidex/forge/rate_limiter.py Diff stat: .claude/skills/cardio-expert | 1 - .claude/skills/cardio-skeptic | 1 - .claude/skills/immunology-expert | 1 - .claude/skills/immunology-skeptic | 1 - .claude/skills/infectious-expert | 1 - .claude/skills/infectious-skeptic | 1 - .claude/skills/metabolic-expert | 1 - .claude/skills/metabolic-skeptic | 1 - .claude/skills/oncology-expert | 1 - .claude/skills/oncology-skeptic | 1 - .orchestra-slot.json | 2 +-

Last Error

Review gate REJECT attempt 1/10: PR deletes 7 files (~1126 lines) belonging to the cross-disease analogy engine that was merged in PR #783 just before this task — including scidex/agora/cross_disease_analogy.py, the cron systemd unit + timer, the migration, the persona prompt, the runner script, and its merged spec — all of which still exist on main HEAD. This is a catastrophic API contract break (kills the /analogies endpoint and its cron job) and 

Git Commits (3)

Squash merge: orchestra/task/ce83d86a-structural-biology-pipeline-sequence-to (2 commits) (#789)2026-04-27
[Forge] Update spec work log for structural biology pipeline [task:ce83d86a-aea3-4801-ac8a-fd45ee71d94e]2026-04-27
[Forge] Structural biology pipeline: sequence → ESM → AlphaFold → druggability → docking handoff [task:ce83d86a-aea3-4801-ac8a-fd45ee71d94e]2026-04-27
Spec File

Effort: extensive

Goal

Compose esm, alphafold-structure, and the new docking workflow
into a sequence-to-druggability pipeline that takes a UniProt
accession (or raw FASTA), runs ESM C embeddings to predict functional
sites, fetches or computes the AlphaFold structure, scores druggability
of detected pockets, and emits a structured "drug-target dossier"
artifact summarizing the protein's chances. Hands the top pocket off
to the docking workflow if druggability score crosses threshold.

Why this matters

Druggability assessment is a multi-step reasoning chain — sequence
features alone are weak, structure alone misses functional context,
and pocket detection without ligandability scoring is just geometry.
SciDEX has the components but no composer. A debate over "is <gene>
worth pursuing therapeutically?" today gets a vague answer; this
pipeline produces a numerical dossier (folded-confidence, n_pockets,
pocket_volume, druggability_z, solvent-exposed surface, predicted
allostery sites) that a Domain Expert can argue from concrete data.

Acceptance Criteria

☐ New module scidex/forge/structural_biology.py (≤1000 LoC):
- sequence_features(uniprot_or_fasta) — uses esm ESM C
embeddings to identify likely binding-site residues
(per-residue attention weights from a finetuned head, or the
ESM-bind-site checkpoint if available); returns annotated
residue list.
- fetch_or_predict_structure(uniprot) — pulls AlphaFold model
if confidence ≥ 70; otherwise (rare for human) runs a local
AlphaFold or ESMFold prediction.
- score_druggability(pdb_path) — runs fpocket for pocket
detection, computes druggability score for each pocket using
the published Schmidtke linear model
(vol0.045 + hydroph0.07 - polar*0.05), ranks pockets.
- dossier(uniprot) — composes; emits JSON dossier with the
full chain (sequence features, structure source, pockets,
druggability scores, recommendation), commits as artifact.
- handoff_to_docking(dossier) — if top_pocket_druggability >
0.7
, calls docking_workflow.pipeline(gene) and links the
output artifact to the dossier.
☐ Migration target_dossier(dossier_id PRIMARY KEY, uniprot,
gene_symbol, structure_source, top_pocket_druggability,
n_pockets, recommendation TEXT CHECK IN ('high_priority',
'investigate','low_priority','undruggable'), docking_run_id NULL,
pipeline_version, generated_at, artifact_id)
.
tools.py registers target_dossier_pipeline(uniprot) with
@log_tool_call.
/api/target-dossier/<uniprot> returns the dossier JSON.
/artifacts/<id> renders the dossier with a 3D pocket-
highlighted PDB viewer + a druggability bar chart and the
recommendation banner colored by priority.
☐ Domain Expert prompt receives a dossier_block summarizing
recommendation + top pocket score when a hypothesis names a
target with a recent dossier; mirrors GTEx-injection pattern.
☐ Acceptance: python -m scidex.forge.structural_biology
--uniprot P04637 (TP53) completes <15 min; dossier flags the
DBD pocket as druggable (recommendation investigate); if
--auto-handoff set, kicks off a docking run.
☐ Tests: tests/test_target_dossier.py — mock AlphaFold + fpocket
output; assert dossier JSON shape matches schema; handoff fires
only when threshold crossed.

Approach

  • ESM C embeddings are mid-cost (~1 s per 100 residues on CPU);
  • cache embeddings under data/esm/<uniprot>.npy.
  • AlphaFold structure pulled via the existing alphafold-structure
  • tool; if local prediction needed, run via colabfold CLI on GPU.
  • Schmidtke druggability formula is published — implement once in
  • scidex/forge/structural_biology.py.
  • Handoff to docking is opt-in via flag; do not auto-run docking
  • on every dossier (cost-discipline).
  • Dossier artifact is artifact_kind='target_dossier' — register
  • the kind via q-devx-artifact-kind-scaffolder.

    Dependencies

    • esm, alphafold-structure skills.
    • q-tool-drug-docking-workflow — handoff target.
    • q-devx-artifact-kind-scaffolder (wave-3) — registers new kind.

    Work Log

    Work Log

    2026-04-27 — Implemented (commit a622d7e00)

    • scidex/forge/structural_biology.py (900 LoC): Full pipeline module with sequence_features, fetch_or_predict_structure, score_druggability, dossier, handoff_to_docking, get_recent_dossier. ESM-2 attention-based binding site detection with heuristic fallback. fpocket + Schmidtke sigmoid formula 1/(1+exp(-raw/30+1.5)) where raw = vol0.045 + hydroph0.07 - polar*0.05. Recommendation tiers: >0.7=high_priority, 0.4-0.7=investigate, 0.2-0.4=low_priority, <0.2=undruggable.
    • migrations/add_target_dossier_table.py: PostgreSQL table with recommendation CHECK IN (...) and 4 indexes.
    • scidex/forge/tools.py: target_dossier_pipeline(uniprot) registered with @require_preregistration @log_tool_call.
    • api.py: POST /api/target-dossier/{uniprot} (run + optional auto_handoff), GET /api/target-dossier/{uniprot} (fetch latest), GET /api/target-dossier (list with recommendation filter).
    • agent.py: _dossier_block injected into domain expert prompt alongside DepMap block — mirrors GTEx-injection pattern, builds from get_recent_dossier().
    • tests/test_target_dossier.py: 15 tests — all passing. Covers fpocket info parsing, Schmidtke formula range/ordering, fallback paths, dossier JSON schema, handoff threshold boundary (strictly >0.7).

    Payload JSON
    {
      "_gate_retry_count": 1,
      "_gate_last_decision": "REJECT",
      "_gate_last_reason": "PR deletes 7 files (~1126 lines) belonging to the cross-disease analogy engine that was merged in PR #783 just before this task \u2014 including scidex/agora/cross_disease_analogy.py, the cron systemd unit + timer, the migration, the persona prompt, the runner script, and its merged spec \u2014 all of which still exist on main HEAD. This is a catastrophic API contract break (kills the /analogies endpoint and its cron job) and erases another worker's recently-shipped feature. The structural biology pipelin",
      "_gate_judge_used": "max_outlook1:claude-auto",
      "_gate_last_instructions": "Restore scidex/agora/cross_disease_analogy.py (861 lines) from main HEAD \u2014 `git checkout origin/main -- scidex/agora/cross_disease_analogy.py`\nRestore migrations/cross_vertical_analogy.py (79 lines) from main HEAD\nRestore scripts/run_cross_disease_analogy.py (42 lines) from main HEAD\nRestore deploy/scidex-cross-disease-analogy.service and deploy/scidex-cross-disease-analogy.timer from main HEAD\nRestore scidex/agora/prompts/analogy_v1.md from main HEAD\nRestore docs/planning/specs/q-vert-cross-disease-analogy-engine_spec.md from main HEAD\nWhen restoring api.py, ensure the /analogies route from PR #783 is preserved alongside the new /api/target-dossier routes (the diff excerpt shows both routes were intended to coexist per the original conflict-resolution note)",
      "_gate_branch": "orchestra/task/ce83d86a-structural-biology-pipeline-sequence-to",
      "_gate_changed_files": [
        ".claude/skills/cardio-expert",
        ".claude/skills/cardio-skeptic",
        ".claude/skills/immunology-expert",
        ".claude/skills/immunology-skeptic",
        ".claude/skills/infectious-expert",
        ".claude/skills/infectious-skeptic",
        ".claude/skills/metabolic-expert",
        ".claude/skills/metabolic-skeptic",
        ".claude/skills/oncology-expert",
        ".claude/skills/oncology-skeptic",
        ".orchestra-slot.json",
        "agent.py",
        "api.py",
        "api_routes/senate.py",
        "deploy/scidex-cross-disease-analogy.service",
        "deploy/scidex-cross-disease-analogy.timer",
        "docs/planning/specs/q-sand-rate-limit-aware-tools_spec.md",
        "docs/planning/specs/q-time-field-shift-detector_spec.md",
        "docs/planning/specs/q-tool-structural-biology-pipeline_spec.md",
        "docs/planning/specs/q-vert-cross-disease-analogy-engine_spec.md",
        "docs/planning/specs/q-vert-vertical-personas-pack_spec.md",
        "migrations/021_rate_buckets.py",
        "migrations/20260427_debate_persona_assignment.sql",
        "migrations/add_field_shift_tables.py",
        "migrations/add_target_dossier_table.py",
        "migrations/cross_vertical_analogy.py",
        "personas/cardio-expert/SKILL.md",
        "personas/cardio-skeptic/SKILL.md",
        "personas/immunology-expert/SKILL.md",
        "personas/immunology-skeptic/SKILL.md",
        "personas/infectious-expert/SKILL.md",
        "personas/infectious-skeptic/SKILL.md",
        "personas/metabolic-expert/SKILL.md",
        "personas/metabolic-skeptic/SKILL.md",
        "personas/oncology-expert/SKILL.md",
        "personas/oncology-skeptic/SKILL.md",
        "scidex/agora/cross_disease_analogy.py",
        "scidex/agora/prompts/analogy_v1.md",
        "scidex/agora/vertical_persona_router.py",
        "scidex/forge/rate_limiter.py",
        "scidex/forge/rate_limits.yaml",
        "scidex/forge/structural_biology.py",
        "scidex/forge/tools.py",
        "scidex/senate/field_shift_detector.py",
        "scidex/senate/scheduled_tasks.py",
        "scripts/run_cross_disease_analogy.py",
        "tests/test_field_shift_detector.py",
        "tests/test_target_dossier.py",
        "tests/test_vertical_persona_routing.py"
      ],
      "_gate_diff_stat": ".claude/skills/cardio-expert                       |    1 -\n .claude/skills/cardio-skeptic                      |    1 -\n .claude/skills/immunology-expert                   |    1 -\n .claude/skills/immunology-skeptic                  |    1 -\n .claude/skills/infectious-expert                   |    1 -\n .claude/skills/infectious-skeptic                  |    1 -\n .claude/skills/metabolic-expert                    |    1 -\n .claude/skills/metabolic-skeptic                   |    1 -\n .claude/skills/oncology-expert                     |    1 -\n .claude/skills/oncology-skeptic                    |    1 -\n .orchestra-slot.json                               |    2 +-\n agent.py                                           |   21 +-\n api.py                                             | 1606 ++++++++++++++++++--\n api_routes/senate.py                               |   42 -\n deploy/scidex-cross-disease-analogy.service        |   14 -\n deploy/scidex-cross-disease-analogy.timer          |   10 -\n .../specs/q-sand-rate-limit-aware-tools_spec.md    |   32 -\n .../specs/q-time-field-shift-detector_spec.md      |   16 -\n .../q-tool-structural-biology-pipeline_spec.md     |   11 +\n .../q-vert-cross-disease-analogy-engine_spec.md    |   49 -\n .../specs/q-vert-vertical-personas-pack_spec.md    |   48 -\n migrations/021_rate_buckets.py                     |   35 -\n migrations/20260427_debate_persona_assignment.sql  |   37 -\n migrations/add_field_shift_tables.py               |  129 --\n migrations/add_target_dossier_table.py             |   57 +\n migrations/cross_vertical_analogy.py               |   79 -\n personas/cardio-expert/SKILL.md                    |   76 -\n personas/cardio-skeptic/SKILL.md                   |   74 -\n personas/immunology-expert/SKILL.md                |   85 --\n personas/immunology-skeptic/SKILL.md               |   90 --\n personas/infectious-expert/SKILL.md                |   77 -\n personas/infectious-skeptic/SKILL.md               |   86 --\n personas/metabolic-expe",
      "_gate_history": [
        {
          "ts": "2026-04-27 17:14:47",
          "decision": "REJECT",
          "reason": "PR deletes 7 files (~1126 lines) belonging to the cross-disease analogy engine that was merged in PR #783 just before this task \u2014 including scidex/agora/cross_disease_analogy.py, the cron systemd unit + timer, the migration, the persona prompt, the runner script, and its merged spec \u2014 all of which still exist on main HEAD. This is a catastrophic API contract break (kills the /analogies endpoint and its cron job) and erases another worker's recently-shipped feature. The structural biology pipelin",
          "instructions": "Restore scidex/agora/cross_disease_analogy.py (861 lines) from main HEAD \u2014 `git checkout origin/main -- scidex/agora/cross_disease_analogy.py`\nRestore migrations/cross_vertical_analogy.py (79 lines) from main HEAD\nRestore scripts/run_cross_disease_analogy.py (42 lines) from main HEAD\nRestore deploy/scidex-cross-disease-analogy.service and deploy/scidex-cross-disease-analogy.timer from main HEAD\nRestore scidex/agora/prompts/analogy_v1.md from main HEAD\nRestore docs/planning/specs/q-vert-cross-disease-analogy-engine_spec.md from main HEAD\nWhen restoring api.py, ensure the /analogies route from PR #783 is preserved alongside the new /api/target-dossier routes (the diff excerpt shows both routes were intended to coexist per the original conflict-resolution note)",
          "judge_used": "max_outlook1:claude-auto",
          "actor": "claude-auto:47",
          "retry_count": 1
        }
      ]
    }

    Sibling Tasks in Quest (Forge) ↗