[Atlas] Processing step lineage — track transforms in provenance chains

← All Specs

Goal

Extend the provenance system to capture not just parent-child artifact relationships but
the processing steps between them. When an experiment is extracted from a paper, the
provenance should record: "Paper 12345 was processed by extraction-agent using
llm_structured_extraction method with schema v2, producing experiment artifact X."

This creates a full audit trail of how every artifact was constructed.

Current State

  • artifact_links captures derives_from, cites, extends relationships
  • provenance_chain JSON in artifacts captures parent artifacts
  • Neither captures the transform applied (what method, what agent, what parameters)

Acceptance Criteria

processing_steps table or extended artifact_links metadata:
- source_artifact_id — input artifact
- target_artifact_id — output artifact
- step_type — extraction, analysis, aggregation, transformation, validation, debate
- agent_id — which agent performed the step
- method — what method/tool was used
- parameters — JSON of method parameters
- started_at, completed_at — timing
- input_hash, output_hash — for reproducibility verification
record_processing_step() function called during artifact creation
☐ Processing steps shown in provenance graph visualization
☐ Reproducibility check: same input + same method + same parameters = same output hash?
☐ API: GET /api/artifact/{id}/processing-history — full transform chain

Dependencies

  • None (parallel with schema governance, integrates with provenance system)

Dependents

  • a17-24-REPR0001 — Reproducible analysis chains use processing steps
  • d16-24-PROV0001 — Provenance demo showcases processing lineage

Work Log

Tasks using this spec (1)
[Atlas] Processing step lineage — track transforms in proven
File: sen-sg-06-PROC_processing_step_lineage_spec.md
Modified: 2026-04-24 07:15
Size: 1.9 KB