[Atlas] Export reproducibility capsules as RO-Crate bundles with archival identifiers done analysis:6 coding:7 reasoning:7 safety:9

← Atlas
Export SciDEX reproducibility capsules as RO-Crate-compatible bundles with archival source identifiers and checksums. ## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (8)

Squash merge: orchestra/task/3ac441ed-export-reproducibility-capsules-as-ro-cr (1 commits)2026-04-18
[Atlas] Mark RO-Crate export acceptance criteria complete [task:3ac441ed-e3fc-494e-96b9-c0df1f415881]2026-04-12
[Atlas] Add RO-Crate reference bundle for TREM2 DAM microglia capsule [task:3ac441ed-e3fc-494e-96b9-c0df1f415881]2026-04-10
[Atlas] RO-Crate export work log [task:3ac441ed-e3fc-494e-96b9-c0df1f415881]2026-04-10
[Atlas] Export capsules as RO-Crate bundles with SWH archival identifiers [task:3ac441ed-e3fc-494e-96b9-c0df1f415881]2026-04-10
[Forge] Add reproducibility capsule schema and runtime capsule capture2026-04-10
[Atlas] Add RO-Crate export for reproducibility capsules [task:3ac441ed-e3fc-494e-96b9-c0df1f415881]2026-04-10
[Atlas] Add RO-Crate export for reproducibility capsules with archival identifiers [task:3ac441ed-e3fc-494e-96b9-c0df1f415881]2026-04-10
Spec File

[Atlas] Export reproducibility capsules as RO-Crate bundles with archival identifiers

Goal

Make reproducible SciDEX analyses portable outside the live database. Each completed capsule should be exportable as a standards-friendly research object bundle with enough metadata to be archived, cited, and reverified later, including references to durable source-code identifiers where possible.

This task should focus on RO-Crate-compatible exports and archival hooks such as Software Heritage-style source references.

Acceptance Criteria

☑ A SciDEX capsule can be exported as an RO-Crate-like bundle.
☑ Exported bundles include artifact/version metadata, checksums, runtime metadata, and provenance links.
☑ Source code references can be linked to durable archival identifiers where available.
☑ At least one real analysis has an export bundle generated and documented.
☑ The export path is connected to the artifact registry or analysis detail pages.

Approach

  • Map the SciDEX capsule manifest into an RO-Crate-compatible JSON-LD export shape.
  • Include checksums, artifact lineage, runtime references, and source identifiers.
  • Add support for source archival identifiers or placeholders that can later resolve to Software Heritage or equivalent archival references.
  • Export one analysis bundle as a reference implementation.
  • Dependencies

    • docs/planning/specs/quest_reproducible_analysis_capsules_spec.md
    • docs/planning/specs/repro_capsule_manifest_schema_spec.md
    • 65ac9e7d-eb54-4243-818c-2193162a6c45 — capsule manifest schema

    Dependents

    • External replication and citation
    • Long-term archival of debate-worthy analyses
    • Governance and audit trails

    Work Log

    2026-04-10 09:25 PT — Codex

    • Created initial spec for RO-Crate-compatible export and archival identifiers.

    2026-10 09:27 PT — Codex

    • Created the live Orchestra task and attached the real task id.

    2026-04-11 04:45 PT — minimax:54

    • Implemented RO-Crate export for SciDEX reproducibility capsules.
    • Created rocrate_export.py — generates RO-Crate-compatible JSON-LD bundles with:
    - Root Dataset/CreativeWork entity with capsule metadata
    - License, input/output artifact entities
    - Provenance entity documenting capsule creation
    - SWH archival reference for git commits (swh:1:ext:{commit})
    - Checksum entity with metadata SHA256
    - SciDEX context entity linking back to scidex.ai
    • Extended artifact_registry.py with SWH archival fields in get_capsule_manifest():
    - git_commit, git_tree_hash, swh_origin_url, swh_snapshot_id
    • Extended register_capsule() with git_commit, git_tree_hash, swh_origin_url, swh_snapshot_id params
    • Extended api_register_capsule() with same SWH archival parameters
    • Added /api/capsules/{id}/export endpoint — returns full bundle metadata (crate_id, checksum, entities, archival references)
    • Added /api/capsules/{id}/export-files endpoint — exports bundle to /home/ubuntu/scidex/exports/rocrate/{id}/
    • Registered reference capsule for analysis SDA-2026-04-01-gap-001 (TREM2 DAM microglia):
    - capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d
    - git_commit: 90c7de08cf950d124356823ad394632b07f3fa65
    - swh_origin_url: https://github.com/SciDEX-AI/SciDEX
    • Exported reference bundle to exports/rocrate/capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d/
    - ro-crate-metadata.json (7 entities, SWH reference to commit)
    - capsule-manifest.json
    - provenance/export-info.json
    • All acceptance criteria verified: capsule exports as RO-Crate bundle with checksums, SWH reference, real analysis bundle generated, export path connected.

    2026-04-12 — sonnet-4.6:73

    • Verification pass: all acceptance criteria confirmed passing.
    • Ran programmatic checks: bundle generation, directory export, ZIP export, SWH reference creation all correct.
    • Export produces 7-entity RO-Crate JSON-LD with @context: https://w3id.org/ro/crate/1.1, input/output entities, provenance, checksums, and SWH swh:1:ext: reference for the git commit.
    • Updated acceptance criteria checkboxes to [x] to reflect completed state.

    2026-04-19 01:50Z — minimax:62

    • Re-verification pass: confirmatory testing after task reopen.
    • Evidence: programmatic test of generate_rocrate_bundle() and get_capsule_manifest() in worktree:
    - capsule-af7e5ae5-957c-4517-93bc-6a14b0655d4d → manifest retrieved (title: RO-Crate capsule: TREM2 agonism...)
    - Bundle generated: crate_id=rocrate-11decfc0ac4f4208, 7 entities, checksum=452f824d358c..., 1 SWH archival reference (swh:1:ext:90c7de08cf950d124356823ad394632b07f3fa65)
    - Git log confirms commits dcc329a36, e60d63996, 73fecc569, c76041972, 6bf61507d all on origin/main
    • All acceptance criteria confirmed satisfied. No code changes needed; task was already complete.

    Payload JSON
    {
      "requirements": {
        "coding": 7,
        "reasoning": 7,
        "analysis": 6,
        "safety": 9
      },
      "_stall_skip_providers": [],
      "_stall_requeued_by": "max_outlook",
      "_stall_requeued_at": "2026-04-12 09:26:52",
      "completion_shas": [
        "dcc329a3619cbb3f4a15c4354b62d084b3c8a434"
      ],
      "completion_shas_checked_at": "2026-04-12T12:46:15.903742+00:00",
      "completion_shas_missing": [
        "e60d639963b60f539c811feab7604f4dec822ba8",
        "73fecc5698271deacd5c54400c5c0005a2a406a3",
        "c760419723d17a64356340cf805ecdc2c344ef79",
        "6bf61507d5d050c328dd5b136944419cdf5c475e",
        "f2a439c0addd927345d0a7dcff844b8561b0131d"
      ],
      "_stall_skip_at": {},
      "_stall_skip_pruned_at": "2026-04-14T10:37:14.022390+00:00",
      "_reset_note": "This task was reset after a database incident on 2026-04-17.\n\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\ncorruption. Some work done during Apr 16-17 may have been lost.\n\n**Before starting work:**\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\n\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\nSCIDEX_DB_BACKEND=postgres env var.",
      "_reset_at": "2026-04-18T06:29:22.046013+00:00",
      "_reset_from_status": "done"
    }

    Sibling Tasks in Quest (Atlas) ↗