[Artifacts] Extend artifact_type to include figure, code, model, protein_design, dataset
Goal
Broaden the artifact system from its current types (wiki_page, hypothesis, analysis, paper, kg_edge, notebook, causal_edge) to include new scientific artifact types. Each new type gets a defined metadata schema so downstream tools know what fields to expect.
New Artifact Types & Metadata Schemas
figure
Visualization outputs from analyses or notebooks.
{
"format": "png|svg|pdf|html",
"resolution_dpi": 300,
"source_notebook_id": "notebook-xxx",
"source_analysis_id": "analysis-xxx",
"caption": "Figure 1: ...",
"dimensions": {"width": 800, "height": 600}
}
code
Reusable code artifacts (scripts, libraries, pipelines).
{
"language": "python|r|julia",
"entrypoint": "run_analysis.py",
"dependencies": ["numpy>=1.21", "scipy"],
"git_commit": "abc123",
"description": "Pipeline for differential expression"
}
model
Scientific or ML models (detailed schema in a17-23-MODL0001).
{
"model_family": "biophysical|deep_learning|statistical",
"framework": "scipy|pytorch|sklearn",
"parameter_count": 1500,
"training_config": {},
"evaluation_metrics": {}
}
protein_design
Engineered or predicted protein structures.
{
"pdb_id": "optional-pdb-id",
"uniprot_id": "Q9NZC2",
"sequence": "MEPLR...",
"method": "alphafold|rosetta|experimental",
"mutations": ["A123V", "G456D"],
"predicted_stability": 0.85,
"binding_affinity": {"target": "TREM2", "kd_nm": 12.5}
}
dataset
References to external or internal datasets.
{
"source": "zenodo|figshare|geo|allen_brain|internal",
"external_id": "GSE123456",
"url": "https://...",
"license": "CC-BY-4.0",
"format": "csv|parquet|h5ad|tsv",
"row_count": 50000,
"schema_summary": "Columns: gene, cell_type, expression..."
}
tabular_dataset
Structured tabular data with column-level metadata.
{
"source": "derived|external",
"format": "csv|parquet|tsv",
"row_count": 50000,
"columns": [
{"name": "gene", "dtype": "string", "description": "Gene symbol", "linked_entity_type": "gene"},
{"name": "log_fc", "dtype": "float", "description": "Log fold change"}
],
"parent_dataset_id": "dataset-xxx"
}
Implementation
Update artifact_registry.py:
- Add new types to the valid types list/set
- Add
ARTIFACT_METADATA_SCHEMAS dict mapping type → JSON schema
- Optional: validate metadata against schema on
register_artifact() - Add convenience functions:
register_figure(),
register_code(),
register_protein_design()Update any UI components that display artifact types (badges, icons, filters)Acceptance Criteria
☑ register_artifact() accepts all 6 new artifact_type values without error
☑ Metadata schema definitions documented in code as ARTIFACT_METADATA_SCHEMAS
☑ Convenience registration functions for each new type
☑ Metadata validation warns (not errors) on missing recommended fields
☑ Existing artifact types continue to work unchanged
☑ Work log updated with timestamped entry
Dependencies
- None (root task — can start immediately)
- Parallel with: a17-18-VERS0001
Dependents
- a17-21-EXTD0001 (external datasets)
- a17-22-TABL0001 (tabular datasets)
- a17-23-MODL0001 (model artifacts)
- d16-20-AVER0001 (demo: versioned protein design)
Work Log
- 2026-04-04 16:30 UTC: Implementation complete. Added:
-
tabular_dataset to ARTIFACT_TYPES (other 5 types already present)
-
ARTIFACT_METADATA_SCHEMAS dict with recommended/optional fields for all 6 new types
-
_validate_metadata() function that warns (not errors) on missing recommended fields
- Convenience functions:
register_figure(),
register_code(),
register_model(),
register_protein_design(),
register_dataset(),
register_tabular_dataset() - All tests pass: types recognized, schemas defined, warnings issued correctly.