SciDEX — Task: [Atlas] Processing step lineage

processing_steps table capturing agent, method, parameters, timing, hashes for reproducibility

Last Error

cli-reopen-manual: reopened — task was marked 'done' but has no task_runs row in (done/completed/success)

Git Commits (7)

[Atlas/Senate/Agora] Spec: notebook + artifact versioning extensions2026-04-24

Squash merge: orchestra/task/sen-sg-0-schema-registry-track-schemas-per-artifa (1 commits)2026-04-18

Squash merge: orchestra/task/47b17cbf-sen-sg-01-sreg-schema-registry-track-art (1 commits)2026-04-16

[Senate] Add schema registry API: GET /api/schemas and /api/schemas/{type} in api.py [task:sen-sg-01-SREG]2026-04-16

[Senate] Schema registry: migration, seeding, and /senate/schemas UI [task:47b17cbf-a8ac-419e-9368-7a2669da25a8]2026-04-06

[Senate] Holistic prioritization run 2: quest fixes + 3 new CI tasks [task:b4c60959-0fe9-4cba-8893-c88013e85104]2026-04-06

[Senate] Holistic prioritization: 6 tasks created for uncovered P88-P95 quests [task:b4c60959-0fe9-4cba-8893-c88013e85104]2026-04-06

Spec File

Goal

Extend the provenance system to capture not just parent-child artifact relationships but
the processing steps between them. When an experiment is extracted from a paper, the
provenance should record: "Paper 12345 was processed by extraction-agent using
llm_structured_extraction method with schema v2, producing experiment artifact X."

This creates a full audit trail of how every artifact was constructed.

Current State

artifact_links captures derives_from, cites, extends relationships
provenance_chain JSON in artifacts captures parent artifacts
Neither captures the transform applied (what method, what agent, what parameters)

Acceptance Criteria

☐ processing_steps table or extended artifact_links metadata:

- source_artifact_id — input artifact
- target_artifact_id — output artifact
- step_type — extraction, analysis, aggregation, transformation, validation, debate
- agent_id — which agent performed the step
- method — what method/tool was used
- parameters — JSON of method parameters
- started_at, completed_at — timing
- input_hash, output_hash — for reproducibility verification

☐ record_processing_step() function called during artifact creation

☐ Processing steps shown in provenance graph visualization

☐ Reproducibility check: same input + same method + same parameters = same output hash?

☐ API: GET /api/artifact/{id}/processing-history — full transform chain