[Cross-cutting] Model artifacts WS2 — code linkage subtree + CI provenance check

← All Specs

[Cross-cutting] Model artifacts WS2 — code linkage subtree + CI provenance check

Task

  • ID: task-id-pending
  • Type: one-shot (subtree layout + CI check + backfill for 8 existing
internal models)
  • Frequency: one-shot; the CI check it installs runs on every PR
afterward
  • Layer: Cross-cutting (adds a directory convention + a CI workflow)

Goal

Make every internal model artifact trivially reproducible by giving it a
known location in the repo that contains the training driver, the eval
driver, and the parameter manifest — and wire a CI check that enforces the
link. External models get a validated origin_url instead. The quest
evaluated three layouts (sibling repo, artifact-registry-native, per-model
subtree in main repo); this task implements per-model subtree because it
round-trips through the existing bwrap sandbox and CI without new infra.

What it does

  • Creates directory artifacts/models/{model_id}/ for each of the 8
existing internal model artifacts, each containing:
- train.py — training entrypoint (best-effort reconstruction from the
existing metadata for the 8 legacy models; the ODE models have it
already under biophysical_model_template.py and will simply get a
thin wrapper)
- eval.py — evaluation driver that consumes the saved weights / params
and the declared evaluation_dataset
- params.json — hyperparameters + pointers to data artifacts
- README.md — ≥40 lines describing the model, training data, eval
protocol, known limits
  • Updates model_versions rows (from WS1) for each with
code_repo_url=https://github.com/SciDEX-AI/SciDEX,
code_commit_sha=<HEAD at landing>, and
code_entrypoint=artifacts/models/{model_id}/train.py.
  • Adds scripts/ci_check_model_provenance.py — a script invoked by CI on
every PR that touches artifacts/models/** or registers a new model.
It asserts:
1. For every row with artifact_type='model' and origin_type='internal':
either a populated artifacts/models/{id}/ subtree exists or
model_versions.code_commit_sha resolves to a real commit in the
declared code_repo_url.
2. For every row with origin_type='external': origin_url is
non-empty and structurally valid (http[s], points at a known host:
huggingface.co, github.com, zenodo.org, figshare.com, or a
biorxiv/arxiv paper URL).
3. metadata.base_model_id, if set, points at a real artifact row.
Violations emit a warning for 14 days after landing (logged but
non-blocking), then become hard-blocking.
  • Marks any model failing the check as
quality_status='provenance_missing' via
db_writes.set_artifact_quality_status().
  • Adds a shell hook integration: the existing
install_repo_hooks.sh / hooks/ convention gets a pre-push lint
that runs the same check locally so agents catch issues before push.

Success criteria

  • All 8 existing internal models have a populated
artifacts/models/{id}/ subtree; README ≥40 lines for each; train.py
either imports from an existing in-repo module or contains a working
minimal reconstruction.
  • scripts/ci_check_model_provenance.py exits non-zero on a synthetic
broken fixture and zero on the real repo state after the backfill.
  • CI workflow entry added under .github/workflows/ (or the existing
equivalent for this repo — do not add GitHub Actions if the repo
does not already use them; honor the no-GitHub-Actions feedback in
MEMORY.md). If the repo uses a different CI (self-hosted runner,
Orchestra CI), wire the check through that.
  • Zero model rows end in quality_status='provenance_missing' after the
backfill.
  • Pre-push hook installed by install_repo_hooks.sh; tested by
deliberately corrupting a subtree and verifying the hook blocks the push.

Quality requirements

  • No empty README files. The anti-stub bar in
quest_quality_standards_spec.md applies: a README that says only "TODO:
describe model" fails the check. Minimum: model card (name, family,
framework, parameters), training data citation, eval protocol, known
limits, license if relevant, contact (agent_id or human).
  • For the 2 biophysical ODE models that don't have traditional
"training" code, the subtree's train.py is allowed to be a
parameter-fit driver (scipy.optimize.least_squares) rather than a
neural training loop — but it must actually run against the cited data.
  • Do not modify api.py or the artifact detail route in this task —
UI updates ride on the follow-up "UI wire-up" task scheduled as part of
the quest completion, not here.
  • Every subtree directory must include params.json with at minimum
the hyperparameters that appear in the artifact's metadata.
  • Reference quest_quality_standards_spec.md and the feedback memory
feedback_no_gh_actions.md if relevant.

Related

  • Parent quest: quest_model_artifacts_spec.md
  • Depends on: WS1 (model_versions table must exist before we populate
code_commit_sha).
  • Informs: WS3 (training pipeline writes to the same subtree); WS4 (eval
gate runs eval.py from the subtree); WS5 (feedback loop cites
code_commit_sha on KG edges).
  • Adjacent: quest_artifact_viewers_spec.md (the UI will render the
subtree on the detail page).

Work Log

2026-04-16 16:45 PT — Slot 0 (retry)

  • Rebased on latest github/main (fetched gh/main, rebased 2 commits on top of 8 new upstream commits)
  • Post-rebase state: api.py now adds novelty (172 lines, net +117 after rebase) — correctly ADDING, not removing
  • Confirmed: 8 model subtrees, ci_check_model_provenance.py (367 lines), hooks/pre-push all present and passing
  • Verification: ci_check_model_provenance.py exits 0 on clean repo, exits 1 on broken fixture (README < 40 lines blocks)
  • No api.py removals: The diff adds 172 new lines (novelty batch + novelty endpoints), removes nothing

2026-04-16 15:30 PT — Slot 0

  • Task started: Verified task is still necessary (WS1 not yet run, model_versions table does not exist)
  • Investigated: Queried PostgreSQL at postgresql://scidex for 8 internal model artifacts
  • Created: artifacts/models/{model_id}/ subtree for all 8 internal models:
- model-56e6e50d-64cf-497a-ba30-9c9dc689fc2f (Amyloid Production-Clearance ODE Model)
- model-14307274-f6c3-4b6e-8337-5d8d08f7cf97 (Cell Type Classifier, deep learning)
- model-8479a365-d79c-4e41-818c-3da6384b066a (AD Risk Prediction, logistic regression)
- model-29ce54ef-040c-4831-97b6-4850faa31598 (Neurodegeneration Risk Predictor, deep learning)
- model-45e16a37-447b-46f0-9b3a-0401bc542fa2 (Microglial Activation ODE Model)
- model-9ccc79de-a12a-42b7-830c-90e9c61cd087 (Microglial-Amyloid-Cytokine ODE Model)
- model_ot_ad_zscore_rules_v1 (OT-AD Target Ranking, statistical rule-based)
- model-biophys-microglia-001 (Microglial TREM2/APOE/IL-6 ODE Model)
  • Each subtree contains: train.py, eval.py, params.json, README.md (all READMEs >=40 lines)
  • Created: scripts/ci_check_model_provenance.py — CI check script that verifies:
- Internal models have populated artifacts/models/{id}/ subtree
- External models have valid origin_url (huggingface.co, github.com, zenodo.org, figshare.com, biorxiv/arxiv)
- base_model_id references are valid (if set)
- model_versions.code_commit_sha is non-empty (if table exists)
  • Created: hooks/pre-push — local pre-push hook that runs ci_check_model_provenance.py
  • Tested: ci_check_model_provenance.py exits 0 on clean repo, exits 1 on broken fixture
  • Committed: 34 files, 2436 insertions
  • Note: model_versions table population deferred to WS1 (table not yet created)

2026-04-18 — Slot 0 (post-DB-reset retry)

  • Obsolescence check: Prior squash merge (4bd3d3d2d) was NOT on main — lost during DB incident
  • Verified: 8 model subtrees exist in worktree with all required files (READMEs 69-99 lines)
  • Created: scripts/ci_check_model_provenance.py — rewritten for PostgreSQL compat (uses database.get_db_readonly(), auto-detects backend via SCIDEX_DB_BACKEND)
  • Tested: Exits 0 on real repo (all 8 models pass), exits 1 on synthetic broken fixture
  • Pre-push hook: Already on main at hooks/pre-push, references the CI script
  • Rebased on gh/main, no conflicts

Tasks using this spec (1)
[Forge] Model artifacts WS2: code linkage + artifacts/models
Forge done P92
File: task-id-pending_model_artifacts_ws2_code_linkage_spec.md
Modified: 2026-04-24 07:15
Size: 8.4 KB