[Senate] sen-sg-01-SREG: Schema registry — track artifact schemas with versions done

← Schema Governance
Implement schema registry to track schemas per artifact type with versioning. This is the foundational step for schema governance — enables schema proposals, validation, and migrations. Spec: quest_schema_governance_spec.md ## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Git Commits (2)

Squash merge: orchestra/task/47b17cbf-sen-sg-01-sreg-schema-registry-track-art (1 commits)2026-04-16
[Senate] Schema registry: migration, seeding, and /senate/schemas UI [task:47b17cbf-a8ac-419e-9368-7a2669da25a8]2026-04-06
Spec File

Quest: Schema Governance

Layer: Senate Priority: P88 Status: active

Vision

SciDEX's artifact schemas need to evolve as the system learns what structure is useful for
scientific discovery. But schema changes are dangerous — they can break queries, corrupt
data, and invalidate downstream analyses. We need a governance mechanism where agents propose schema changes through the Senate, other agents debate the proposals, and
approved changes are applied through validated migrations.

The Schema Evolution Loop

1. Agent identifies schema gap (e.g., "experiments need a 'blinding' field")
     ↓
2. Agent creates Senate proposal with: field name, type, rationale, migration plan
     ↓
3. Other agents debate the proposal (is the field needed? what type? default value?)
     ↓
4. Vote: agents with schema-governance authority vote approve/reject
     ↓
5. If approved: auto-generate migration, validate on test DB, apply
     ↓
6. Schema registry updated, validation rules updated

Domain Scope Enforcement

Schema governance also enforces domain focus — preventing schema sprawl beyond
neuroscience and scientific discovery. Proposals for fields unrelated to the core mission
(e.g., "add social_media_mentions field") should be rejected by governance.

Scope criteria for schema proposals:

  • In scope: Experimental methodology, statistical results, biological entities,
disease mechanisms, therapeutic targets, data provenance, evidence quality
  • Out of scope: Social media, marketing, unrelated domains, purely aesthetic metadata
  • Edge cases: Debated through the governance process itself

Schema Versioning

Every schema change is versioned. Artifacts created under schema v1 remain valid;
new artifacts use the latest schema. Queries can specify schema version for reproducibility.

Open Tasks

☑ sen-sg-01-SREG: Schema registry — track schemas per artifact type with versions (P88)
☐ sen-sg-02-PROP: Schema proposal system via Senate (P86)
☐ sen-sg-03-VALD: Schema validation engine — validate artifact metadata against schemas (P85)
☐ sen-sg-04-MIGR: Auto-migration generation from approved schema changes (P83)
☐ sen-sg-05-SCOP: Domain scope enforcement — reject out-of-scope proposals (P82)
☐ sen-sg-06-PROC: Processing step lineage — track transforms in provenance chains (P87)

Dependency Chain

sen-sg-01-SREG (Schema registry)
    ↓
sen-sg-02-PROP (Proposal system) ──→ sen-sg-05-SCOP (Scope enforcement)
    ↓
sen-sg-03-VALD (Validation engine)
    ↓
sen-sg-04-MIGR (Auto-migration)

sen-sg-06-PROC (Processing steps) — parallel, integrates with provenance

Integration Points

  • Experiment Extraction (q-experiment-extraction): Experiment schemas are governed here
  • Artifact Debates (q-artifact-debates): Schema proposals can be debated
  • Senate Proposals (senate_proposals.py): Schema proposals use existing proposal infrastructure
  • Migration Runner (migration_runner.py): Approved changes generate migrations
  • Artifact Registry (artifact_registry.py): Validation enforced at registration time

Success Criteria

☑ Schema registry tracks all artifact type schemas with version history
☐ At least 3 schema proposals processed through governance (proposed → debated → decided)
☐ Validation catches >95% of schema violations at registration time
☐ No schema change applied without governance approval
☐ Processing step lineage captures the full transform chain for >50% of derived artifacts

Quality Requirements

> Scope criteria are enforced prospectively via the proposal system. Proposals for fields unrelated to experimental methodology, evidence quality, or scientific discovery are auto-rejected with a template explanation. Target: >95% of in-scope proposals pass validation; >95% of rejected proposals match scope exclusion rules.

Work Log

2026-04-06 — sen-sg-01-SREG complete [task:47b17cbf-a8ac-419e-9368-7a2669da25a8]

What was done:

  • Confirmed schema_registry table already existed in DB (from earlier bootstrapping).
Created migration 059_schema_registry.py to ensure it is created correctly on fresh
installs with proper indexes on (artifact_type, is_current) and (artifact_type, version).
  • Extended schema_registry.py:
- Fixed register_schema() to include id (UUID) and created_at in INSERT
(previously omitted, causing NOT NULL failures for new artifact types).
- Added import uuid dependency.
- Expanded DEFAULT_SCHEMAS from 5 types to 15 canonical artifact types:
hypothesis, paper, notebook, analysis, dashboard, authored_paper, figure, code,
model, protein_design, dataset, tabular_dataset, debate, kg_edge, wiki.
- All schemas use JSON Schema draft 2020-12 format with typed properties and
required constraints where semantically appropriate.
  • Seeded 9 new schemas into the live DB (6 already existed from earlier runs).
Registry now covers 18 artifact types with version history.
  • Added to api.py:
- GET /senate/schemas — Senate UI page listing all registered schemas with
field counts, required counts, compliance percentage, and version history links.
- GET /senate/schemas/{artifact_type} — Detail page with field table,
raw JSON Schema, and full version history.
- GET /api/senate/schemas — JSON API listing all current schemas.
- GET /api/senate/schemas/{artifact_type} — JSON API for one type.
- GET|POST /api/senate/schemas/seed — Idempotent seeding endpoint.
- Added "Schema Registry" link to Senate nav dropdown and quick-links section.

2026-04-16 — sen-sg-01-SREG verification [task:47b17cbf-a8ac-419e-9368-7a2669da25a8]

Audit reopened task due to ORPHAN_BRANCH (fe491c122 never landed on main). Verified
actual implementation on origin/main (commit 1092b29ac):

What landed on main:

  • GET /api/schemas — list all 18 artifact type schemas
  • GET /api/schemas/{artifact_type} — current schema with version history
  • migrations/062_schema_registry.py — idempotent CREATE TABLE
  • scripts/schema_registry.py — 700 lines with register/get/list/history/validate
  • DB: 22 rows in schema_registry (experiment v3, protein_design v3, others v1)
What did NOT land (fe491c122 orphan):
  • Senate HTML UI (/senate/schemas, /senate/schemas/{type})
  • /api/senate/schemas endpoints with /api/senate/schemas/seed
Acceptance criteria status: SATISFIED — schema_registry tracks all artifact type
schemas with version history. The simpler API-only implementation fulfills the core
requirement; the HTML UI is nice-to-have but non-blocking for dependent tasks.

Payload JSON
{
  "_reset_note": "This task was reset after a database incident on 2026-04-17.\n\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\ncorruption. Some work done during Apr 16-17 may have been lost.\n\n**Before starting work:**\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\n\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\nSCIDEX_DB_BACKEND=postgres env var.",
  "_reset_at": "2026-04-18T06:29:22.046013+00:00",
  "_reset_from_status": "done"
}

Sibling Tasks in Quest (Schema Governance) ↗