[Atlas/feat] notebook_artifact_id FK + notebook_cells ownership backfill done

← Mission Control
Per §3 of the notebook+versioning extensions spec: add notebooks.notebook_artifact_id FK -> artifacts.id (NOT NULL after backfill), and notebook_cells.notebook_artifact_id FK so cells are owned by a specific notebook *version* rather than the loose ipynb_path-only association we have today. Backfill existing notebooks: each gets a type='notebook' artifact row at version_number=1 if it doesn't already, and existing cells point to it. This is the prerequisite for the cell-append flow in §2.2 to work without orphaning cells.

Git Commits (11)

[Atlas] Iteration 7 verification: task fully on main, add work log entry [task:9b6823f6-a10b-445b-bed5-14ddcfd1d212] (#630)2026-04-27
[Atlas] Add forward-path creation test + register integration marker [task:9b6823f6-a10b-445b-bed5-14ddcfd1d212] (#584)2026-04-27
Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (144 commits) (#479)2026-04-26
Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (144 commits) (#479)2026-04-26
[Atlas] Add FK constraint enforcement tests for notebook_artifact_id [task:9b6823f6-a10b-445b-bed5-14ddcfd1d212] (#464)2026-04-26
[Atlas] Iteration 4 verification: all notebook_artifact_id FK deliverables confirmed complete [task:9b6823f6-a10b-445b-bed5-14ddcfd1d212] (#454)2026-04-26
[Atlas] Add §3 spec update + FK integration tests for notebook_artifact_id [task:9b6823f6-a10b-445b-bed5-14ddcfd1d212]2026-04-25
Squash merge: orchestra/task/9b6823f6-notebook-artifact-id-fk-notebook-cells-o (1 commits)2026-04-25
[Atlas] Fix _register_stub: insert artifact before notebook FK, add notebook_artifact_id [task:9b6823f6-a10b-445b-bed5-14ddcfd1d212]2026-04-25
Squash merge: orchestra/task/9b6823f6-notebook-artifact-id-fk-notebook-cells-o (4 commits)2026-04-24
Squash merge: orchestra/task/9b6823f6-notebook-artifact-id-fk-notebook-cells-o (4 commits)2026-04-24
Spec File

Notebook + artifact versioning extensions

> Why one spec, not five. The investigation that motivated this spec (2026-04-24) found that the plumbing for versioned artifacts already exists in production — the artifacts table carries version_number, parent_version_id, content_hash, is_latest, version_tag, changelog, lifecycle_state; artifact_links carries cross-artifact edges; notebook_cells is a real table; the GET /api/artifacts/{id}/versions endpoints work. What's missing is the connections: debates that pin a specific artifact-version, a cell-append API that bumps the notebook version, a "chamber/workspace" pull-in mechanism, and structured metadata on artifact_links rows. This spec wires those four connections together so we don't fork into five overlapping specs.

Parent: [artifact_versioning_spec.md](artifact_versioning_spec.md).

---

1. What we're keeping (no change)

The audit confirmed these are correctly built and don't need re-spec:

  • artifacts table versioning columnsversion_number, parent_version_id, content_hash, is_latest, version_tag, changelog, lifecycle_state, deprecated_at, superseded_by. Already populated for new artifacts.
  • API: GET /api/artifacts/{id}/versions, GET /api/artifacts/{id}/versions/{N}, GET /api/artifacts/{id}/diff. Already serving.
  • artifact_registry.py: create_version(), get_version_history(), diff_versions(), pin_version() are implemented (task 58309097-1f15-4cb6 completed 2026-04-16).
  • artifact_links table: (source_artifact_id, target_artifact_id, link_type, strength, evidence). Link types: derives_from, cites, extends, supports, contradicts.
  • notebooks table + on-disk .ipynb/.html pairs at site/notebooks/.
  • notebook_cells table (notebook_id, cell_index, cell_type, code, output).

Anything new in this spec must compose on top of these without breaking them.

---

2. Four extensions

2.1 Debate ↔ artifact-version pinning

Problem: debate_sessions.target_artifact_version exists but is always NULL/empty; target_content_hash is always ''. Debates effectively reference an unversioned artifact, so a debate that argued about hypothesis-H-89 in March doesn't tell you which version was being argued.

Fix:

  • Auto-populate on debate creation. When a debate_session is created with target_artifact_id, the creator function looks up artifacts.version_number + content_hash for the latest version (or the explicitly-passed version) and writes both to the row. NOT NULL going forward; backfill historical rows once with a one-time migration that picks "latest as of debate's started_at" — best-effort, mark backfilled rows in a pinning_note column.
  • Pin every artifact a debate round actually consumes. Add debate_rounds.referenced_artifacts JSONB (default '[]'::jsonb). Each entry: {artifact_id, version_number, content_hash, role: 'input'|'output'|'evidence', cited_at_offset_chars: int}. The debate engine populates this whenever a round's prompt or output cites an artifact (the existing skill-citation logic from quest_commentary_curator_spec produces the same kind of edges; reuse).
  • API: GET /api/debate/{session_id}/artifacts returns the union of the session's pinned target + every round's referenced_artifacts flattened, with version-resolved metadata. UI: debate transcript shows 🔗 hyp-H-89@v3 chips that link to the pinned version (not "latest").
  • Migration: add the JSONB column; no schema break since old rows default to [].
  • 2.2 Notebook cell-append (extend an existing notebook)

    Problem: notebooks are immutable after generation. There is no way to ask "add a differential expression analysis to hypothesis-Q-89-notebook" without regenerating from scratch.

    Fix:

  • POST /api/notebooks/{id}/cells — append-only. Body:

  • {
         "cell_type": "code|markdown",
         "source": "...",
         "execute": true,
         "agent_id": "ed-lein",
         "method": "differential-expression",
         "parameters": {"contrast": "AD vs control", "fdr": 0.05},
         "rationale": "Why this cell is being added"
       }

  • Server side: the call (a) creates a NEW artifact row of type notebook with parent_version_id = current notebook artifact id, version_number += 1, is_latest=TRUE, demotes the parent's is_latest=FALSE; (b) writes the new cell to notebook_cells with cell_index = max+1 AND a foreign key to the new artifact row; (c) optionally executes the cell via nbconvert and caches outputs; (d) renders .html for the new version, stores the path; (e) writes an artifact_links edge new_version --extends--> parent_version; (f) records the agent + method in a new processing_steps table per §2.4 below.
  • Cell-level diff: GET /api/notebooks/{id}/diff?from=v1&to=v2 returns a diff using nbdime semantics (added/removed/modified cells). Reuse nbdime's protocol JSON; don't roll our own.
  • Tagging: human-readable version tags via the existing pin_version(). E.g., the notebook a debate consumed is auto-tagged "debate-{session_id}-input" so the lineage is queryable from the artifact alone.
  • 2.3 Chamber/workspace: pull a versioned artifact in

    Problem: there's no "workspace" or "chamber" — when a debate or persona-driven task wants to work with hypothesis-H-89@v3 + notebook-NB-12@v2 + paper-P-77@v1, it just names the IDs in prose. No structured pull-in, no isolation.

    Fix:

  • New table chambers — minimal:

  • id UUID PK
       name TEXT
       purpose TEXT  ('debate' | 'experiment_design' | 'persona_workspace' | 'showcase_review')
       owner_actor_type TEXT, owner_actor_id TEXT  (matches existing comment author convention)
       parent_session_id UUID  (debate_sessions.id, NULL ok)
       created_at TIMESTAMPTZ DEFAULT now()
       closed_at TIMESTAMPTZ

  • chamber_artifacts (the pull-in) — pinned versions:

  • chamber_id UUID
       artifact_id UUID
       version_number INT
       content_hash TEXT
       role TEXT  ('input' | 'reference' | 'workbench')
       added_at TIMESTAMPTZ DEFAULT now()
       added_by_actor_type TEXT, added_by_actor_id TEXT
       PRIMARY KEY (chamber_id, artifact_id, version_number)

  • API:
  • - POST /api/chambers — create
    - POST /api/chambers/{id}/pull — body: [{artifact_id, version_number?}] (defaults to latest)
    - GET /api/chambers/{id} — full chamber state with all pinned versions hydrated
    - POST /api/chambers/{id}/close — closes the chamber, optionally writes a result-artifact
  • Debate integration: when a debate session starts, the engine creates a chamber, pulls the target artifact + supporting persona corpora + cited papers, then the debate happens "in" the chamber. Round outputs land back in the chamber as role='workbench'. The chamber is a stable referencer for "what was visible to the agents during this debate".
  • Persona integration: persona task workspaces become chambers with purpose='persona_workspace'. The persona's bio + paper corpus + previous debates the persona participated in are pulled in as role='reference'.
  • Closed-chamber summary: when a chamber closes, a hash of its (artifact_id, version_number, role) set is stored on the parent debate/task as chamber_provenance_hash for later dispute/replay/fork detection. The hash captures exactly which artifact-versions were in scope, without requiring a full copy. A chamber_summary JSON blob (who participated, key turns, final score) is written to the chamber row and linked to the parent. A GET /api/chambers/{id}/replay endpoint returns the chamber's full state (all pinned artifact-versions + summary) so any agent can rehydrate the chamber context and replay or fork from a clean checkpoint.
  • 2.4 Structured provenance on artifact_links

    Problem: artifact_links rows have no record of what operation created the link, or what agent/model did it.

    Fix:

  • Add to artifact_links:
  • - method TEXT — the operation that produced the link (cell_append, debate_output, cite, derives_from, extends, reproduced_with_diff, contribution_attribution, etc.)
    - agent_id TEXT — which agent contributed the link (null for auto-detected links)
    - processing_step_id UUID — FK to a new processing_steps table (see below)
    - link_metadata JSONB — method-specific extra fields (e.g., for cell_append: {notebook_id, cell_index, cell_hash}, for debate_output: {session_id, round_number})

  • New processing_steps table — immutable log of operations that create/extend artifacts:

  • id UUID PK
       method TEXT  ('cell_append' | 'artifact_fork' | 'debate_consolidation' | 'notebook_regeneration' | 'agent_contribution' | 'manual_edit')
       artifact_id UUID
       artifact_version_number INT
       actor_type TEXT, actor_id TEXT
       inputs JSONB   -- [{artifact_id, version_number, role, content_hash}]
       outputs JSONB  -- [{artifact_id, version_number, role, content_hash}]
       parameters JSONB
       rationale TEXT
       started_at TIMESTAMPTZ DEFAULT now()
       completed_at TIMESTAMPTZ
       status TEXT  ('running' | 'completed' | 'failed')
       error TEXT

  • Reuse existing patterns: processing_steps.method='cell_append' is written by the cell-append API (§2.2); method='debate_consolidation' by the debate engine; method='agent_contribution' when a persona edits an artifact directly.
  • Query: GET /api/artifacts/{id}/lineage?depth=5&method=cell_append returns the chain of cells + who added each, using processing_steps.inputs and artifact_links traversed together. The depth param prevents infinite loops on cyclic graphs.
  • ---

    3. Implementation order

  • Chambers (2.3) — simplest new table, no dependencies, enables debate isolation early.
  • Debate pinning (2.1) — low-risk column additions, backfill migration, read-heavy API.
  • Cell-append (2.2) — more complex write path; depends on (1) for workspace isolation.
  • Structured provenance (2.4) — wires everything together; depends on (2) and (3).
  • ---

    4. Open questions

    #QuestionResolution owner
    4.1Should chamber close auto-tag the result artifact with chamber-{id}-output, or leave tagging to the caller?Atlas team
    4.2For backfill of debate_rounds.referenced_artifacts, should we replay the LLM prompt extraction off raw logs, or accept best-effort and mark pinning_note='backfill-best-effort'?Senate + Agora
    4.3Who can query GET /api/chambers/{id}/replay — anyone with the chamber ID, or only participants?Senate + security review
    4.4Should processing_steps rows be immutable inserts only (no UPDATE), so audit trails are tamper-evident?Senate + legal
    4.5For cell_append where execution fails, do we still create the versioned artifact + cell row with status='failed', or rollback entirely?Atlas team
    ---

    Work Log

    2026-04-26 01:37 PT — Iteration 1 (claude-auto:42)

    • Summary: [Atlas] Notebook cell-append API + cell-level diff [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: 5fbf44c32
    • Notes: Initial implementation of POST /api/notebooks/{id}/cells (append-only) + GET /api/notebooks/{id}/diff

    2026-04-26 01:54 PT — Iteration 1 (claude-auto:42)

    • Summary: [Atlas] Add nbconvert execution + HTML rendering to notebook cell-append API [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: 7042ecdc8
    • Notes: nbconvert execution, HTML rendering, processing_steps recording

    2026-04-27 06:24 PT — Iteration 2 (minimax:76)

    • Summary: [Atlas] Work log: verify feature complete on main [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: none
    • Notes: Verified §2.2 cell-append and diff already on main; no new work needed this cycle

    2026-04-27 06:46 PT — Iteration 3 (claude-auto:47)

    • Summary: [Atlas] Work log: iteration 3 live verification of notebook cell-append + diff [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: none
    • Notes: Live API test: POST /api/notebooks/{id}/cells → 200, GET /api/notebooks/{id}/diff → correct added cell diff

    2026-04-27 08:31 PT — Iteration 4 (claude-auto:41)

    • Summary: [Atlas] Fix cell_index=0 falsy bug in notebook cell-append; add integration tests [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: none
    • Notes: Fixed bug where cell_index=0 was treated as missing; added integration tests

    2026-04-27 08:39 PT — Iteration 5 (claude-auto:41)

    • Summary: [Atlas] Iteration 5 final verification: all notebook cell-append + diff tests pass [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: none
    • Notes: All tests pass; coverage confirmed for §2.2

    2026-04-27 09:03 PT — Iteration 6 (minimax:78)

    • Summary: [Atlas] Work log: iteration 6 verification of §2.2 notebook cell-append + diff [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: none
    • Notes: Verified feature complete; live diff test confirmed correct

    2026-04-27 09:18 PT — Iteration 7 (minimax:78)

    • Summary: [Atlas] Replace deprecated datetime.utcnow() with timezone-aware datetime.now() in notebook cell-append [task:f535e6c9-7185-41c4-b850-8316228e6500]
    • Commits: d9b81033
    • Notes: Fix: datetime.utcnow() → datetime.now(timezone.utc) in api.py cell-append

    2026-04-27 09:53 PT — Iteration 7 (minimax:77)

    • Summary: Verification pass: task fully on main, no work needed
    • Commits: none
    • Notes: Verified: notebooks.notebook_artifact_id NOT NULL FK → artifacts(id) present; notebook_cells.notebook_artifact_id NOT NULL FK → artifacts(id) present; 590/590 notebooks have artifact, 241/241 cells have artifact; 12 migration runner + 17 cell-append integration tests all pass; no dangling refs. Task is complete, worktree at origin/main with no local changes.

    2026-04-27 09:39 PT — Iteration 8 (minimax:79)

    • Summary: Verification: all 48 tests pass (§2.2 cell-append+diff confirmed on main at d9b81033); §2.1 referenced_artifacts + debate artifacts endpoint live; feature complete
    • Commits: none
    • Notes: 31 unit + 17 integration tests all pass. Live diff API confirmed working. §2.2 (§2.2 cell-append + diff) fully on main. §2.1 referenced_artifacts JSONB column and GET /api/debates/{session_id}/artifacts endpoint live on main.

    2026-04-27 09:50 PT — Iteration 9 (minimax:79)

    • Summary: Work log: confirm all three sub-features (§2.1 debate pinning, §2.2 cell-append+diff, §2.3 chambers pull-in) verified complete on main; add iteration 9 work log entry
    • Commits: none
    • Notes: §2.1: debate_rounds.referenced_artifacts JSONB + GET /api/debates/{session_id}/artifacts + auto-populate target_artifact_version on debate creation. §2.2: POST /api/notebooks/{id}/cells (append-only) + GET /api/notebooks/{id}/diff?from=v1&to=v2 using nbdime semantics; all tests pass. §2.3: chambers + chamber_artifacts tables, POST /api/chambers, POST /api/chambers/{id}/pull, GET /api/chambers/{id}, POST /api/chambers/{id}/close, persona workspace endpoints all live. Feature complete; no new commits needed.

    2026-04-27 10:50 PT — Iteration 11 (minimax:74)

    • Summary: Final verification: chambers §2.3 implementation confirmed complete on main; no new commits needed
    • Commits: none
    • Notes: Live API verification: POST /api/chambers → 200 + returns chamber ID; GET /api/chambers/{id} → 200 + returns chamber with artifacts array; POST /api/chambers/{id}/close → 200 + returns provenance_hash; debate engine wiring at scidex_orchestrator.py:2645-2917 creates chamber at session start, pulls target artifact + cited papers + persona corpus, closes with provenance_hash written to debate_sessions.chamber_provenance_hash. All four §2.3 endpoints verified functional. Task complete.