Layer: Cross-cutting Priority: P93 Status: active
Every artifact_type='model' row on SciDEX must answer three questions on sight:
what is it, where did it come from, and how do we reproduce it. Today the
answer is partial. The artifacts table already carries origin_type,
origin_url, version_number, parent_version_id, version_tag, changelog,
and is_latest columns, and our 8 registered model artifacts already populate
origin_type='internal'. But the detail page
(https://scidex.ai/artifact/model-29ce54ef-040c-4831-97b6-4850faa31598,
"Neurodegeneration Risk Predictor") does not surface that provenance, does not
link to the code commit that produced the weights, does not expose sibling
versions, and does not show how the model's outputs flowed back into the world
model. A model artifact that cannot be re-run or re-trained from the artifact
page is a stub, regardless of how impressive its metrics look.
This quest formalizes the model artifact system as a first-class object with
five guarantees: (1) every model declares external-or-internal and carries
type-appropriate metadata — external models point at an upstream HF / GitHub /
paper checkpoint with pin, internal models point at the exact commit + training
run that produced them; (2) every internal model carries a code_repo_url + pair that round-trips through our CI sandbox; (3) each model
code_commit_sha
has an append-only version lineage where each child cites the parent and the
eval delta; (4) agents can kick off training runs via the GPU sandbox pilot
(WS4 of quest_competitive_biotools_spec.md) and have the resulting checkpoint
auto-register as a new version; (5) model outputs (cell type calls, risk
scores, fine-mapped variants) flow back into the KG as edges with attribution
and confidence that the world_model_improvements driver already understands.
The broader bet: models are not just artifacts we display — they are the
engines of world-model improvement. A fine-tuned scGPT that annotates cell
types in a new snRNA-seq dataset produces hundreds of KG edges that inherit
the model's provenance. If the model's provenance is rotten, every downstream
edge is rotten. This quest makes provenance non-optional.
origin_type='external', its origin_urlorigin_type='internal', its origin_url maycode_repo_url + code_commit_sha (new) must point at theparent_version_id or in metadata.base_model_id.
version_number,parent_version_id, version_tag, changelog. Versions form a DAG (not aforge/training/train_celltype.py),code_repo_url=https://github.com/SciDEX-AI/SciDEX +code_commit_sha=<40-char SHA> + code_entrypoint=forge/training/…. Forquality_status='provenance_missing'is_latest=0; an evalis_latest flip to 1 for the new version and to 0 for the old. The oldlifecycle_state='superseded'superseded_by pointing at the new version.
source_artifact_id=<model_id> andsource_version=<version_number>. If the model version is superseded, theworld_model_improvements table already tracksmodel_versions table + artifacts reuseExtend the data model to make per-version, per-model metadata queryable
without scraping metadata JSON. The artifacts table already provides the
lineage skeleton (version_number, parent_version_id, version_tag,
changelog, is_latest, lifecycle_state, superseded_by, origin_type,
origin_url). Do not duplicate those. Add a companion model_versions
table keyed by artifact_id that carries fields the generic table cannot:
training invocation, eval metrics snapshot, code pin, training agent, GPU
allocation used, benchmark-suite reference.
Also define a metadata JSON schema (schemas/model_artifact_metadata.json)
that artifact_catalog.register_model() validates against at write time:
{ model_family, framework, parameter_count, base_model_id?, training_config?,. Writes that
evaluation_metrics?, evaluation_dataset, training_data, benchmark_id?,
is_external, external_source_url?, external_source_version? }
fail validation are rejected with a clear error; a backfill task migrates
the 8 existing model artifacts into compliance.
Delivers: task-id-pending_model_artifacts_ws1_schema_spec.md (one-shot,
schema + migration + backfill).
Decide and enforce where training code lives. Options considered:
(a) per-artifact subtree in the main repo under artifacts/models/{model_id}/
— simple, no submodule surgery, CI-tested, but couples model code to the
main repo release cycle;
(b) a sibling repo SciDEX-models with one directory per model family —
cleaner separation, but doubles the release surface and breaks cross-repo
atomic commits;
(c) artifact-registry-native (store code as artifact_type='code' children
of the model artifact) — most discoverable, but requires code execution
infra we do not yet have.
The quest picks (a) for internal models — artifacts/models/{model_id}/
with train.py, eval.py, params.json, README.md — because it round-trips
through our existing CI and bwrap sandbox without new infra. External models
get no subtree; their origin_url is sufficient. Add a CI check that every
internal model artifact either has a populated subtree or carries a
code_commit_sha pointing at a resolvable commit elsewhere; models failing
the check are flagged quality_status='provenance_missing'.
Delivers: task-id-pending_model_artifacts_ws2_code_linkage_spec.md
(one-shot, lay out subtree for 8 existing models + enforce CI check).
Wire the GPU sandbox (pilot delivered by
quest_competitive_biotools_spec.md WS4 / task-id-pending_gpu_sandbox_pilot_spec.md)
into the model artifact system so a training run produces a properly-linked
new version with zero manual registration. The pipeline:
gpu_launch(model_family, parent_artifact_id, training_config,parent_artifact_id → loads parent weights + configorigin_url for external).
register_model_version(parent_artifact_id, run_manifest)artifacts row with version_number = parent.version_number + 1,parent_version_id = parent.id, origin_type='internal',code_commit_sha = <current HEAD>, and a new model_versions row withis_latest=0, lifecycle_state='candidate',Delivers: task-id-pending_model_artifacts_ws3_training_pipeline_spec.md
(one-shot, plus it depends on WS4 pilot being landed).
Enforce that a candidate version does not become "latest" until it clears an
eval gate. Gate: run the benchmark suite declared in metadata.benchmark_id
(or the parent's benchmark if the child inherits) on a held-out test split;
compute the primary metric delta vs parent; require either (a) delta ≥ 0
with statistical significance (bootstrap 1000× over the test set, 95% CI
excludes 0), or (b) delta < 0 with an explicit tradeoff_justification in
the changelog (e.g. "parameter count halved, accuracy -0.8%, latency -4×").
If the gate fails without a justification, the candidate stays
lifecycle_state='candidate' forever; agents cannot retry-promote without
re-running eval on a new seed + documenting why. Promotion triggers a
world_model_improvements event of type model_version_promoted so the
economics pipeline pays out.
Delivers: task-id-pending_model_artifacts_ws4_eval_gate_spec.md
(one-shot, promotion policy + event emitter).
Model outputs that enter the KG must inherit the model's provenance. When a
model produces edges (e.g. scGPT annotates 2340 cells as "microglia" → 2340
(cell, is_a, microglia) edges), each edge carries
source_artifact_id=<model_id>, source_version=<version_number>, and a
confidence derived from the model's softmax or calibration curve. If the
model version is later superseded, edges are not deleted; a recomputation
job re-scores them against the new version and records deltas in
world_model_improvements with event_type='model_rescore'. An agent or
human can always answer "which model version said this?" from a KG edge.
This workstream also defines how model attribution reaches the economics
layer: the agent that trained the model, the agent that registered the
dataset, the agent that authored the benchmark, and the agent that ran the
eval each get a slice of the edge-derived payouts through PageRank
backprop (per project_economics_v2_credit_backprop_2026-04-10).
Delivers: task-id-pending_model_artifacts_ws5_feedback_loop_spec.md
(recurring every-24h, plus a one-shot backfill for existing model-derived
edges that lack attribution).
model_versions table created, schemas/model_artifact_metadata.jsonartifact_catalog.register_model() validates against it,artifacts/models/{model_id}/ subtree or a resolvablecode_commit_sha. CI check lands and blocks PRs that register a modelquality_status='provenance_missing'.
model_versions.trained_byworld_model_improvements row; detail page showssource_artifact_id pointing at a modelmodel_rescore event with the delta distribution. Credit backpropquest_quality_standards_spec.md. No stub models. A modelevaluation_metrics block, anevaluation_dataset that resolves to a real dataset artifact, and atraining_data citation (dataset artifact ID or upstream DOI). Modelsquality_status='incomplete' and do notcode_commit_sha mustcode_repo_url; the CI check asserts this.
@log_tool_call via theregister_model_version helper, so the economics layer sees it.
quest_competitive_biotools_spec.md WS4 for the GPU-sandboxquest_real_data_pipeline_spec.md for dataset citation format.The artifact detail page for artifact_type='model' renders, in order:
origin_type badge (colored: blue=external,v{version_number} — {version_tag or 'untagged'}),lifecycle_state badge (candidate / active / superseded).
abc123de](github-link) on YYYY-MM-DD by agent agent-xyz using GPUga-…"; if external: "External checkpoint fromrev123, registered YYYY-MM-DD";v{n}, version_tag, primary metric value, promotionartifacts/models/{model_id}/ withtrain.py / eval.py / params.json / README.md viewable inline.
A "register new version" call-to-action appears if the viewer has the
model.train capability and the model carries a training subtree.
quest_competitive_biotools_spec.md — WS4 delivers the GPU sandbox thisquest_artifacts_spec.md — base artifact infrastructure (lifecycle,model subtype.
quest_artifact_viewers_spec.md — viewer framework the UI requirementsartifact_enrichment_quest_spec.md — enrichment pipeline that will bequest_real_data_pipeline_spec.md — dataset-citation standard everytraining_data and evaluation_dataset must conform to.
quest_quality_standards_spec.md — anti-stub bar, parallel-agent rule,quest_schema_governance_spec.md — the schemas/model_artifact_metadata.jsonproject_economics_v2_credit_backprop_2026-04-10 — WS5's credit_No entries yet._