[Atlas] Export reproducibility capsules as RO-Crate bundles with archival identifiers

← All Specs

[Atlas] Export reproducibility capsules as RO-Crate bundles with archival identifiers

Goal

Make reproducible SciDEX analyses portable outside the live database. Each completed capsule should be exportable as a standards-friendly research object bundle with enough metadata to be archived, cited, and reverified later, including references to durable source-code identifiers where possible.

This task should focus on RO-Crate-compatible exports and archival hooks such as Software Heritage-style source references.

Acceptance Criteria

☑ A SciDEX capsule can be exported as an RO-Crate-like bundle.
☑ Exported bundles include artifact/version metadata, checksums, runtime metadata, and provenance links.
☑ Source code references can be linked to durable archival identifiers where available.
☑ At least one real analysis has an export bundle generated and documented.
☑ The export path is connected to the artifact registry or analysis detail pages.

Approach

  • Map the SciDEX capsule manifest into an RO-Crate-compatible JSON-LD export shape.
  • Include checksums, artifact lineage, runtime references, and source identifiers.
  • Add support for source archival identifiers or placeholders that can later resolve to Software Heritage or equivalent archival references.
  • Export one analysis bundle as a reference implementation.
  • Dependencies

    • docs/planning/specs/quest_reproducible_analysis_capsules_spec.md
    • docs/planning/specs/repro_capsule_manifest_schema_spec.md
    • 65ac9e7d-eb54-4243-818c-2193162a6c45 — capsule manifest schema

    Dependents

    • External replication and citation
    • Long-term archival of debate-worthy analyses
    • Governance and audit trails

    Work Log

    2026-04-10 09:25 PT — Codex

    • Created initial spec for RO-Crate-compatible export and archival identifiers.

    2026-10 09:27 PT — Codex

    • Created the live Orchestra task and attached the real task id.

    2026-04-11 04:45 PT — minimax:54

    • Implemented RO-Crate export for SciDEX reproducibility capsules.
    • Created rocrate_export.py — generates RO-Crate-compatible JSON-LD bundles with:
    - Root Dataset/CreativeWork entity with capsule metadata
    - License, input/output artifact entities
    - Provenance entity documenting capsule creation
    - SWH archival reference for git commits (swh:1:ext:{commit})
    - Checksum entity with metadata SHA256
    - SciDEX context entity linking back to scidex.ai
    • Extended artifact_registry.py with SWH archival fields in get_capsule_manifest():
    - git_commit, git_tree_hash, swh_origin_url, swh_snapshot_id
    • Extended register_capsule() with git_commit, git_tree_hash, swh_origin_url, swh_snapshot_id params
    • Extended api_register_capsule() with same SWH archival parameters
    • Added /api/capsules/{id}/export endpoint — returns full bundle metadata (crate_id, checksum, entities, archival references)
    • Added /api/capsules/{id}/export-files endpoint — exports bundle to /home/ubuntu/scidex/exports/rocrate/{id}/
    • Registered reference capsule for analysis SDA-2026-04-01-gap-001 (TREM2 DAM microglia):
    - capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d
    - git_commit: 90c7de08cf950d124356823ad394632b07f3fa65
    - swh_origin_url: https://github.com/SciDEX-AI/SciDEX
    • Exported reference bundle to exports/rocrate/capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d/
    - ro-crate-metadata.json (7 entities, SWH reference to commit)
    - capsule-manifest.json
    - provenance/export-info.json
    • All acceptance criteria verified: capsule exports as RO-Crate bundle with checksums, SWH reference, real analysis bundle generated, export path connected.

    2026-04-12 — sonnet-4.6:73

    • Verification pass: all acceptance criteria confirmed passing.
    • Ran programmatic checks: bundle generation, directory export, ZIP export, SWH reference creation all correct.
    • Export produces 7-entity RO-Crate JSON-LD with @context: https://w3id.org/ro/crate/1.1, input/output entities, provenance, checksums, and SWH swh:1:ext: reference for the git commit.
    • Updated acceptance criteria checkboxes to [x] to reflect completed state.

    2026-04-19 01:50Z — minimax:62

    • Re-verification pass: confirmatory testing after task reopen.
    • Evidence: programmatic test of generate_rocrate_bundle() and get_capsule_manifest() in worktree:
    - capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d → manifest retrieved (title: RO-Crate capsule: TREM2 agonism...)
    - Bundle generated: crate_id=rocrate-11decfc0ac4f4208, 7 entities, checksum=452f824d358c..., 1 SWH archival reference (swh:1:ext:90c7de08cf950d124356823ad394632b07f3fa65)
    - Git log confirms commits dcc329a36, e60d63996, 73fecc569, c76041972, 6bf61507d all on origin/main
    • All acceptance criteria confirmed satisfied. No code changes needed; task was already complete.

    Tasks using this spec (1)
    [Atlas] Export reproducibility capsules as RO-Crate bundles
    Atlas done P88
    File: rocrate_swh_export_spec.md
    Modified: 2026-04-25 22:00
    Size: 5.0 KB