[Atlas] Processing step lineage — track transforms in provenance chains

Goal

Extend the provenance system to capture not just parent-child artifact relationships but
the processing steps between them. When an experiment is extracted from a paper, the
provenance should record: "Paper 12345 was processed by extraction-agent using
llm_structured_extraction method with schema v2, producing experiment artifact X."

This creates a full audit trail of how every artifact was constructed.

Current State

artifact_links captures derives_from, cites, extends relationships
provenance_chain JSON in artifacts captures parent artifacts
Neither captures the transform applied (what method, what agent, what parameters)

Acceptance Criteria

☐ processing_steps table or extended artifact_links metadata:

- source_artifact_id — input artifact
- target_artifact_id — output artifact
- step_type — extraction, analysis, aggregation, transformation, validation, debate
- agent_id — which agent performed the step
- method — what method/tool was used
- parameters — JSON of method parameters
- started_at, completed_at — timing
- input_hash, output_hash — for reproducibility verification

☐ record_processing_step() function called during artifact creation

☐ Processing steps shown in provenance graph visualization

☐ Reproducibility check: same input + same method + same parameters = same output hash?

☐ API: GET /api/artifact/{id}/processing-history — full transform chain

Dependencies

None (parallel with schema governance, integrates with provenance system)

Dependents

a17-24-REPR0001 — Reproducible analysis chains use processing steps
d16-24-PROV0001 — Provenance demo showcases processing lineage

Work Log

Tasks using this spec (1)

[Atlas] Processing step lineage — track transforms in proven

Schema Governance open P67

File: sen-sg-06-PROC_processing_step_lineage_spec.md

Modified: 2026-04-24 07:15

Size: 1.9 KB