[Forge] Model artifacts WS3 — agent-invoked training → new version registration
Task
- ID: task-id-pending
- Type: one-shot (wires the existing GPU sandbox into the model
artifact registry; depends on the WS4 pilot of
quest_competitive_biotools_spec.md being landed)
- Frequency: one-shot to build the pipeline; ongoing usage by agents
who call
gpu_launch_training() as a Forge tool
Goal
Make it effortless for an agent to say "fine-tune model X on dataset Y
with these params" and get back a properly-linked new model version —
with the parent artifact pointed at, the code commit pinned, the GPU
allocation debited, and the new artifact parked in
lifecycle_state='candidate' awaiting the WS4 eval gate. Zero manual
artifact-registry surgery.
What it does
- Adds
scidex_tools/model_training.py with the high-level helper:
def train_model_version(
parent_artifact_id: str,
dataset_artifact_id: str,
training_config: dict,
agent_id: str,
wall_time_cap_min: int = 120,
vram_cap_gb: int = 24,
) -> dict:
"""Kick off a training run that produces a new model version.
Resolves parent weights + code subtree, invokes gpu_launch() under
the bwrap sandbox (per quest_competitive_biotools_spec.md WS4),
captures the resulting checkpoint, registers it as a new version
linked to the parent, and returns the new artifact_id.
"""
- Under the hood, the helper:
1. Loads the parent artifact + its
model_versions row.
2. If parent is external, pulls weights from
origin_url into a scratch
dir; if internal, loads weights from
artifacts/models/{parent_id}/ (WS2 subtree).
3. Calls
gpu_launch() (WS4 pilot) with an entrypoint at
artifacts/models/{parent_id}/train.py plus the override config.
4. On success, invokes
register_model_version(parent_artifact_id,
run_manifest) which:
- Writes a new
artifacts row with
artifact_type='model',
version_number = parent.version_number + 1,
parent_version_id = parent.id,
origin_type='internal',
is_latest=0,
lifecycle_state='candidate'.
- Writes a new
model_versions row with
code_repo_url,
code_commit_sha,
code_entrypoint,
training_started_at,
training_completed_at,
trained_by,
gpu_allocation_id,
training_params_json,
eval_metrics_json,
promotion_state='candidate'.
- Copies the new weights into
artifacts/models/{new_id}/weights/ (or records a reference if
weights are too large for the repo and stored in blob storage per
blob_storage.py).
5. Returns a dict with the new
artifact_id, the GPU cost debit, the
eval metrics captured during training, and a URL to the detail page.
- Emits a
world_model_improvements event of type
model_version_trained (not yet
promoted — promotion is WS4) so the
economics pipeline logs the training work.
- Provides a dry-run mode (
dry_run=True) that validates inputs + cost
estimate without launching — used by the CI and by cost-gated agents.
- Adds a Forge-tool registration entry so the helper appears in the
tool_playground and
@log_tool_call captures invocations.
Success criteria
train_model_version() executes end-to-end for one model — scGPT
fine-tune is the preferred target, reusing the WS4 pilot's setup.
Produces a new
artifact_type='model' row with
parent_version_id pointing at the scGPT base artifact and
lifecycle_state='candidate'.
model_versions.gpu_allocation_id matches a live row in
resource_allocations; the cost ledger was debited before launch.
model_versions.code_commit_sha resolves against the current repo
HEAD at launch time.
world_model_improvements row emitted with
event_type='model_version_trained',
target_artifact_id=<new_id>.
- Dry-run returns correct cost estimate without spawning the sandbox.
- No direct writes to
artifacts / model_versions — helper goes
through
artifact_catalog.register_model() from WS1 (so schema
validation runs).
@log_tool_call logs show the helper invocation with the agent_id and
outputs.
Quality requirements
- No stub run: the pilot must actually converge on a real dataset (or
fail with a documented reason) and produce real eval metrics. Empty
or NaN metrics fail the task.
- Reference
quest_competitive_biotools_spec.md WS4 for the sandbox
contract; do not re-implement sandbox policy.
- Reference
quest_real_data_pipeline_spec.md for the dataset-citation
format required on
dataset_artifact_id.
- Reference
quest_economics_spec.md for the cost-ledger contract.
- No changes to
api.py. The helper writes directly through
artifact_catalog +
db_writes.
- Do not bypass
gpu_launch() — the bwrap sandbox is non-negotiable.
- Parallel agents are not used here (single pilot run); follow-up
quests can parallelize once the single-run path is proven.
Related
- Parent quest:
quest_model_artifacts_spec.md
- Depends on: WS1 (
model_versions table), WS2 (training subtree),
and
quest_competitive_biotools_spec.md WS4
(
task-id-pending_gpu_sandbox_pilot_spec.md).
- Informs: WS4 (eval gate promotes candidates this task creates), WS5
(feedback loop attributes edges to versions this task registers).
- Adjacent:
quest_forge_spec.md, quest_economics_spec.md,
quest_analysis_sandboxing_spec.md.
Work Log
2026-04-16 15:30 PT — Slot 72
- Started task: Implement
train_model_version() as specified in WS3
- Files created:
-
scidex_tools/model_training.py (817 lines) — full
train_model_version() implementation with:
- Parent artifact +
model_versions row resolution
- Weight resolution (internal from
artifacts/models/{id}/ or external from
origin_url)
- Dataset manifest writing for the GPU sandbox
-
gpu_launch() call (WS4 bwrap sandbox, pre-flight GPU debit via
reserve_gpu_job())
-
model_versions row + new artifact registration on success
- Checkpoint copy to
artifacts/models/{new_id}/weights/ -
world_model_improvements event emission (
model_version_trained)
-
dry_run=True mode for CI and cost-gated agents
-
scidex/core/event_bus.py — added
model_version_trained to
EVENT_TYPES -
scidex/forge/forge_tools.py — added
train_model_version() wrapper with late import to avoid circular deps, plus "Train Model Version" tool registration entry
-
scidex_tools/__init__.py — exports
train_model_version
-
dry_run=True returns correct cost estimate ($1.00 for 120min @ $0.50/GPU-hr)
- Full run (without dry_run) correctly fails at weight-resolution step when
artifacts/models/{parent_id}/weights/ is absent — expected behavior
- Tool registration: "Train Model Version" appears in
skills table with
skill_type=model_training - All modules pass
python3 -m py_compile
- Verification: Dry-run with real model artifact (
model-56e6e50d-64cf-497a-ba30-9c9dc689fc2f) + dataset artifact (dataset-192467e0-fe96-43cb-a64f-e891cdcff111) returns valid cost estimate
- Commits:
82b310958 — [Forge] Model artifacts WS3: training pipeline via GPU sandbox [task:746fd7c1-13a0-4806-8948-2684e07932a9]
- Result: Done — WS3 pipeline implemented and pushed
2026-04-18 07:45 PT — Slot 60
- Issue: Post-SQLite→PostgreSQL migration,
scidex_tools/model_training.py still used sqlite3.connect() with hardcoded DB_PATH, incompatible with PostgreSQL backend
- Fix: Replace all
sqlite3.connect(DB_PATH) calls with get_db() from scidex.core.database, which auto-detects backend via SCIDEX_DB_BACKEND=postgres env var
- Files modified:
scidex_tools/model_training.py — 6 insertions, 14 deletions (net -8 lines: removed sqlite3 import, DB_PATH constant, row_factory setup; added get_db import)
- Functions updated:
_get_parent_artifact(), _get_dataset_artifact(), _write_dataset_manifest(), _register_new_model_version()
- Commit:
eda5fbdee — [Forge] model_training.py: use get_db() for PostgreSQL compatibility [task:746fd7c1-13a0-4806-8948-2684e07932a9]
- Result: Done — scidex_tools/model_training.py now uses get_db() for PostgreSQL compatibility