[Artifacts] Extend artifact_type to include figure, code, model, protein_design, dataset done coding:7 reasoning:6

← Artifacts
Extend the artifact registry to accept new artifact_type values: figure, code, model, protein_design, dataset, tabular_dataset. Update artifact_registry.py register_artifact() validation. Add type-specific metadata schemas: figure (format, resolution, source_notebook), code (language, entrypoint, dependencies), model (architecture, framework, parameter_count, training_config), protein_design (pdb_id, sequence, method), dataset (source_url, format, row_count, schema). Can run in parallel with a17-18. ## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (3)

Squash merge: orchestra/task/a17-19-T-extend-artifact-type-to-include-figure-c (1 commits)2026-04-15
[Artifacts] Mark extend-artifact-type task done (status:done) [task:a17-19-TYPE0001]2026-04-15
[Artifacts] Extend artifact_type with metadata schemas and convenience functions2026-04-04
Spec File

[Artifacts] Extend artifact_type to include figure, code, model, protein_design, dataset

Goal

Broaden the artifact system from its current types (wiki_page, hypothesis, analysis, paper, kg_edge, notebook, causal_edge) to include new scientific artifact types. Each new type gets a defined metadata schema so downstream tools know what fields to expect.

New Artifact Types & Metadata Schemas

figure

Visualization outputs from analyses or notebooks.

{
  "format": "png|svg|pdf|html",
  "resolution_dpi": 300,
  "source_notebook_id": "notebook-xxx",
  "source_analysis_id": "analysis-xxx",
  "caption": "Figure 1: ...",
  "dimensions": {"width": 800, "height": 600}
}

code

Reusable code artifacts (scripts, libraries, pipelines).

{
  "language": "python|r|julia",
  "entrypoint": "run_analysis.py",
  "dependencies": ["numpy>=1.21", "scipy"],
  "git_commit": "abc123",
  "description": "Pipeline for differential expression"
}

model

Scientific or ML models (detailed schema in a17-23-MODL0001).

{
  "model_family": "biophysical|deep_learning|statistical",
  "framework": "scipy|pytorch|sklearn",
  "parameter_count": 1500,
  "training_config": {},
  "evaluation_metrics": {}
}

protein_design

Engineered or predicted protein structures.

{
  "pdb_id": "optional-pdb-id",
  "uniprot_id": "Q9NZC2",
  "sequence": "MEPLR...",
  "method": "alphafold|rosetta|experimental",
  "mutations": ["A123V", "G456D"],
  "predicted_stability": 0.85,
  "binding_affinity": {"target": "TREM2", "kd_nm": 12.5}
}

dataset

References to external or internal datasets.

{
  "source": "zenodo|figshare|geo|allen_brain|internal",
  "external_id": "GSE123456",
  "url": "https://...",
  "license": "CC-BY-4.0",
  "format": "csv|parquet|h5ad|tsv",
  "row_count": 50000,
  "schema_summary": "Columns: gene, cell_type, expression..."
}

tabular_dataset

Structured tabular data with column-level metadata.

{
  "source": "derived|external",
  "format": "csv|parquet|tsv",
  "row_count": 50000,
  "columns": [
    {"name": "gene", "dtype": "string", "description": "Gene symbol", "linked_entity_type": "gene"},
    {"name": "log_fc", "dtype": "float", "description": "Log fold change"}
  ],
  "parent_dataset_id": "dataset-xxx"
}

Implementation

  • Update artifact_registry.py:
  • - Add new types to the valid types list/set
    - Add ARTIFACT_METADATA_SCHEMAS dict mapping type → JSON schema
    - Optional: validate metadata against schema on register_artifact()
    - Add convenience functions: register_figure(), register_code(), register_protein_design()

  • Update any UI components that display artifact types (badges, icons, filters)
  • Acceptance Criteria

    register_artifact() accepts all 6 new artifact_type values without error
    ☑ Metadata schema definitions documented in code as ARTIFACT_METADATA_SCHEMAS
    ☑ Convenience registration functions for each new type
    ☑ Metadata validation warns (not errors) on missing recommended fields
    ☑ Existing artifact types continue to work unchanged
    ☑ Work log updated with timestamped entry

    Dependencies

    • None (root task — can start immediately)
    • Parallel with: a17-18-VERS0001

    Dependents

    • a17-21-EXTD0001 (external datasets)
    • a17-22-TABL0001 (tabular datasets)
    • a17-23-MODL0001 (model artifacts)
    • d16-20-AVER0001 (demo: versioned protein design)

    Work Log

    • 2026-04-04 16:30 UTC: Implementation complete. Added:
    - tabular_dataset to ARTIFACT_TYPES (other 5 types already present)
    - ARTIFACT_METADATA_SCHEMAS dict with recommended/optional fields for all 6 new types
    - _validate_metadata() function that warns (not errors) on missing recommended fields
    - Convenience functions: register_figure(), register_code(), register_model(), register_protein_design(), register_dataset(), register_tabular_dataset()
    - All tests pass: types recognized, schemas defined, warnings issued correctly.

    Payload JSON
    {
      "requirements": {
        "coding": 7,
        "reasoning": 6
      },
      "_stall_skip_providers": [],
      "_stall_requeued_by": "minimax",
      "_stall_requeued_at": "2026-04-14 01:52:59",
      "_stall_skip_at": {},
      "_stall_skip_pruned_at": "2026-04-14T10:37:14.022390+00:00",
      "completion_shas": [
        "023b47dedb7228eeb00837bb0de3a3178d71f51f"
      ],
      "completion_shas_checked_at": "2026-04-16T00:13:29.760465+00:00"
    }

    Sibling Tasks in Quest (Artifacts) ↗

    Task Dependencies

    ↓ Referenced by (downstream)