[Atlas] Export reproducibility capsules as RO-Crate bundles with archival identifiers
Goal
Make reproducible SciDEX analyses portable outside the live database. Each completed capsule should be exportable as a standards-friendly research object bundle with enough metadata to be archived, cited, and reverified later, including references to durable source-code identifiers where possible.
This task should focus on RO-Crate-compatible exports and archival hooks such as Software Heritage-style source references.
Acceptance Criteria
☑ A SciDEX capsule can be exported as an RO-Crate-like bundle.
☑ Exported bundles include artifact/version metadata, checksums, runtime metadata, and provenance links.
☑ Source code references can be linked to durable archival identifiers where available.
☑ At least one real analysis has an export bundle generated and documented.
☑ The export path is connected to the artifact registry or analysis detail pages.
Approach
Map the SciDEX capsule manifest into an RO-Crate-compatible JSON-LD export shape.
Include checksums, artifact lineage, runtime references, and source identifiers.
Add support for source archival identifiers or placeholders that can later resolve to Software Heritage or equivalent archival references.
Export one analysis bundle as a reference implementation.Dependencies
docs/planning/specs/quest_reproducible_analysis_capsules_spec.md
docs/planning/specs/repro_capsule_manifest_schema_spec.md
65ac9e7d-eb54-4243-818c-2193162a6c45 — capsule manifest schema
Dependents
- External replication and citation
- Long-term archival of debate-worthy analyses
- Governance and audit trails
Work Log
2026-04-10 09:25 PT — Codex
- Created initial spec for RO-Crate-compatible export and archival identifiers.
2026-10 09:27 PT — Codex
- Created the live Orchestra task and attached the real task id.
2026-04-11 04:45 PT — minimax:54
- Implemented RO-Crate export for SciDEX reproducibility capsules.
- Created
rocrate_export.py — generates RO-Crate-compatible JSON-LD bundles with:
- Root Dataset/CreativeWork entity with capsule metadata
- License, input/output artifact entities
- Provenance entity documenting capsule creation
- SWH archival reference for git commits (swh:1:ext:{commit})
- Checksum entity with metadata SHA256
- SciDEX context entity linking back to scidex.ai
- Extended
artifact_registry.py with SWH archival fields in get_capsule_manifest():
- git_commit, git_tree_hash, swh_origin_url, swh_snapshot_id
- Extended
register_capsule() with git_commit, git_tree_hash, swh_origin_url, swh_snapshot_id params
- Extended
api_register_capsule() with same SWH archival parameters
- Added
/api/capsules/{id}/export endpoint — returns full bundle metadata (crate_id, checksum, entities, archival references)
- Added
/api/capsules/{id}/export-files endpoint — exports bundle to /home/ubuntu/scidex/exports/rocrate/{id}/
- Registered reference capsule for analysis SDA-2026-04-01-gap-001 (TREM2 DAM microglia):
- capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d
- git_commit: 90c7de08cf950d124356823ad394632b07f3fa65
- swh_origin_url: https://github.com/SciDEX-AI/SciDEX
- Exported reference bundle to
exports/rocrate/capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d/
- ro-crate-metadata.json (7 entities, SWH reference to commit)
- capsule-manifest.json
- provenance/export-info.json
- All acceptance criteria verified: capsule exports as RO-Crate bundle with checksums, SWH reference, real analysis bundle generated, export path connected.
2026-04-12 — sonnet-4.6:73
- Verification pass: all acceptance criteria confirmed passing.
- Ran programmatic checks: bundle generation, directory export, ZIP export, SWH reference creation all correct.
- Export produces 7-entity RO-Crate JSON-LD with
@context: https://w3id.org/ro/crate/1.1, input/output entities, provenance, checksums, and SWH swh:1:ext: reference for the git commit.
- Updated acceptance criteria checkboxes to [x] to reflect completed state.
2026-04-19 01:50Z — minimax:62
- Re-verification pass: confirmatory testing after task reopen.
- Evidence: programmatic test of
generate_rocrate_bundle() and get_capsule_manifest() in worktree:
- capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d → manifest retrieved (title: RO-Crate capsule: TREM2 agonism...)
- Bundle generated: crate_id=rocrate-11decfc0ac4f4208, 7 entities, checksum=452f824d358c..., 1 SWH archival reference (swh:1:ext:90c7de08cf950d124356823ad394632b07f3fa65)
- Git log confirms commits dcc329a36, e60d63996, 73fecc569, c76041972, 6bf61507d all on origin/main
- All acceptance criteria confirmed satisfied. No code changes needed; task was already complete.