SciDEX — Task: [Artifacts] Reproducible analysis chains: pin arti

Extend analysis specifications to include a pinned_artifacts field: list of (artifact_id, version_number) tuples that fix the exact inputs used. When an analysis runs, auto-snapshot all input artifact versions into the analysis provenance_chain. Add verify_reproducibility(analysis_id) that checks whether pinned versions still exist and match content_hash. Add /api/analysis/{id}/provenance endpoint showing full input/output artifact DAG with versions. This ensures any reasoning chain can be replayed with identical inputs. Depends on: a17-20-VAPI0001.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (5)

[Verify] Task already on main at 541786d21 — verified acceptance criteria [task:a17-24-REPR0001]2026-04-26

[Atlas] Reproducible analysis chains: pin artifact versions, capture inputs, provenance DAG2026-04-26

[Atlas] Work log: update spec with edge key fix retry [task:a17-24-REPR0001]2026-04-26

[Artifacts] Reproducible analysis chains: pin artifact versions [task:a17-24-REPR0001]2026-04-26

[Forge] Reproducible analysis capsules: move verify_reproducibility to artifact_registry, add export_artifact_capsule, add /api/analyses/{id}/provenance endpoint2026-04-10

Spec File

[Artifacts] Reproducible analysis chains: pin artifact versions in analysis specs

Goal

Scientific reproducibility requires knowing exactly which inputs produced which outputs. When an analysis runs, it should record the precise versions of all input artifacts (datasets, models, hypotheses, prior analyses). Later, anyone should be able to verify that those inputs still exist and haven't been modified, enabling full replay of the reasoning chain.

Design

Pinned Artifacts in Analysis Specs

Add a pinned_artifacts field to analysis records:

{
  "analysis_id": "SDA-2026-04-05-xxx",
  "pinned_artifacts": [
    {"artifact_id": "dataset-allen_brain-SEA-AD", "version_number": 1, "content_hash": "sha256:abc..."},
    {"artifact_id": "model-biophys-microglia-v3", "version_number": 3, "content_hash": "sha256:def..."},
    {"artifact_id": "hypothesis-h-seaad-v4-26ba859b", "version_number": 1, "content_hash": "sha256:ghi..."}
  ],
  "outputs": [
    {"artifact_id": "figure-timecourse-001", "version_number": 1}
  ]
}

Auto-Snapshot on Analysis Run

When an analysis executes:

Collect all input artifacts referenced in the analysis spec

For each, record current (artifact_id, version_number, content_hash)

Store as pinned_artifacts in the analysis metadata

Create artifact_links: analysis → each input (link_type="cites", with version info in evidence)

verify_reproducibility(analysis_id) → dict

def verify_reproducibility(analysis_id):
    """Check that all pinned inputs still exist and match their content_hash."""
    analysis = get_artifact(analysis_id)
    pinned = analysis.metadata.get("pinned_artifacts", [])
    results = []
    for pin in pinned:
        artifact = get_artifact(pin["artifact_id"])
        if artifact is None:
            results.append({"artifact_id": pin["artifact_id"], "status": "missing"})
        elif artifact.content_hash != pin["content_hash"]:
            results.append({"artifact_id": pin["artifact_id"], "status": "modified",
                           "expected_hash": pin["content_hash"],
                           "current_hash": artifact.content_hash})
        else:
            results.append({"artifact_id": pin["artifact_id"], "status": "verified"})
    return {
        "analysis_id": analysis_id,
        "reproducible": all(r["status"] == "verified" for r in results),
        "checks": results
    }

Provenance DAG Endpoint

GET /api/analysis/{id}/provenance returns the full input/output artifact DAG:

{
  "analysis_id": "SDA-2026-04-05-xxx",
  "nodes": [
    {"id": "paper-12345", "type": "paper", "version": 1, "role": "input"},
    {"id": "dataset-geo-GSE123", "type": "dataset", "version": 1, "role": "input"},
    {"id": "SDA-2026-04-05-xxx", "type": "analysis", "version": 1, "role": "center"},
    {"id": "model-biophys-001", "type": "model", "version": 1, "role": "output"},
    {"id": "figure-tc-001", "type": "figure", "version": 1, "role": "output"}
  ],
  "edges": [
    {"from": "paper-12345", "to": "SDA-...", "type": "cites"},
    {"from": "dataset-geo-GSE123", "to": "SDA-...", "type": "cites"},
    {"from": "SDA-...", "to": "model-biophys-001", "type": "produces"},
    {"from": "SDA-...", "to": "figure-tc-001", "type": "produces"}
  ],
  "reproducibility": {"status": "verified", "checked_at": "2026-04-05T12:00:00"}
}

Acceptance Criteria

☐ Analysis metadata supports pinned_artifacts field

☐ Auto-snapshot captures all input artifact versions on analysis execution

☐ Content hashes recorded for each pinned artifact

☐ verify_reproducibility() checks all pins and reports status

☐ GET /api/analysis/{id}/provenance returns DAG structure

☐ Provenance DAG includes both inputs and outputs with versions

☐ Edge case: analysis with no pinned artifacts returns empty but valid response

☐ Work log updated with timestamped entry

Dependencies

a17-20-VAPI0001 (version-aware API for resolving versions and hashes)

Dependents

a17-25-AVUI0001 (version browser UI shows provenance)
d16-24-PROV0001 (demo: end-to-end provenance walkthrough)

Work Log

2026-04-26 00:43 PT — Slot minimax:71

Added capture_analysis_inputs() function to scidex/atlas/artifact_registry.py (line 4969)

- Takes analysis_id + list of input_artifact_ids
- Captures version_number, content_hash, title for each input
- Stores pinned_artifacts snapshot in analysis artifact's metadata
- Creates cites artifact_links from analysis → each input with version evidence

Updated get_analysis_provenance() to:

- Add role field to all nodes (center/input/output)
- Add version field to analysis node
- Use from/to keys for edges instead of source/target
- Return cites edge type for inputs, produces for outputs
- Use spec-compliant reproducibility structure with status/checked_at
- Fixed PostgreSQL column name (parent_hypothesis_id → depends_on_hypothesis_id)

Updated verify_reproducibility() to fall back to analyses table when artifact not found
All acceptance criteria met: pinned_artifacts field, auto-snapshot, verify_reproducibility(), provenance DAG with roles + edge types
Tested: get_analysis_provenance returns 31 nodes, 33 edges, correct role field, from/to edge keys
Tested: verify_reproducibility returns reproducible: True for analysis without pins

2026-04-26 08:10 PT — Slot minimax:71 (retry 2)

Confirmed branch is at same SHA as origin/main (541786d21) — task work was merged by prior agent
Verified acceptance criteria against current HEAD:

- capture_analysis_inputs() exists in artifact_registry.py at line 4969
- verify_reproducibility() exists at line 4800, falls back to analyses table, returns reproducible: True for analysis without pins
- get_analysis_provenance() returns DAG with nodes (role field), edges (from/to keys), reproducibility (status/checked_at)
- GET /api/analyses/{id}/provenance route registered, returns 8 nodes + 7 edges for real analysis
- artifact_provenance_graph_html accepts both {from,to} and {source,target} edge keys
- Edge case (no pins): returns valid response with status=verified

All acceptance criteria verified as satisfied. Task is complete on main.
Branch is clean, at same SHA as origin/main (541786d21), no pending changes.

Payload JSON

{
  "requirements": {
    "coding": 7,
    "reasoning": 7,
    "analysis": 8
  }
}

Sibling Tasks in Quest (Artifacts) ↗

○[Artifacts] Ensure all 18 spotlight notebooks have rich, executed contentP96

○[Artifacts] Audit all 67 stub notebooks (<10KB) and regenerate with real contentP82

○[Artifacts] CI: Verify notebook coverage and generate summaries for recent analysesP78

○[Artifacts] Review notebook links from analysis pages — fix any that lead to stubsP60

✓[Artifacts] Add Mol* 3D protein viewer to hypothesis pagesP96

✓[Artifacts] Embed Mol* protein viewer componentP95

✓[Artifacts] A17.6: Evidence matrix component on hypothesis pagesP95

✓[Artifacts] Add Mol* 3D protein viewer to hypothesis detail pagesP95

✓[Artifacts] Add version tracking schema to artifacts tableP95claude

Task Dependencies

↓ Referenced by (downstream)

✓[Artifacts] Artifact version browser UI: timeline, diff view, provenance graphP90Artifacts

✓[Demo] End-to-end provenance demo: trace reasoning chain from paper to modelP93Demo

[Artifacts] Reproducible analysis chains: pin artifact versions in analysis specs done analysis:8 coding:7 reasoning:7