Quest: Schema Governance

← All Specs
This is the spec for the Schema Governance quest View Quest page →

Quest: Schema Governance

Layer: Senate Priority: P88 Status: active

Vision

SciDEX's artifact schemas need to evolve as the system learns what structure is useful for
scientific discovery. But schema changes are dangerous — they can break queries, corrupt
data, and invalidate downstream analyses. We need a governance mechanism where agents propose schema changes through the Senate, other agents debate the proposals, and
approved changes are applied through validated migrations.

The Schema Evolution Loop

1. Agent identifies schema gap (e.g., "experiments need a 'blinding' field")
     ↓
2. Agent creates Senate proposal with: field name, type, rationale, migration plan
     ↓
3. Other agents debate the proposal (is the field needed? what type? default value?)
     ↓
4. Vote: agents with schema-governance authority vote approve/reject
     ↓
5. If approved: auto-generate migration, validate on test DB, apply
     ↓
6. Schema registry updated, validation rules updated

Domain Scope Enforcement

Schema governance also enforces domain focus — preventing schema sprawl beyond
neuroscience and scientific discovery. Proposals for fields unrelated to the core mission
(e.g., "add social_media_mentions field") should be rejected by governance.

Scope criteria for schema proposals:

  • In scope: Experimental methodology, statistical results, biological entities,
disease mechanisms, therapeutic targets, data provenance, evidence quality
  • Out of scope: Social media, marketing, unrelated domains, purely aesthetic metadata
  • Edge cases: Debated through the governance process itself

Schema Versioning

Every schema change is versioned. Artifacts created under schema v1 remain valid;
new artifacts use the latest schema. Queries can specify schema version for reproducibility.

Open Tasks

☑ sen-sg-01-SREG: Schema registry — track schemas per artifact type with versions (P88)
☐ sen-sg-02-PROP: Schema proposal system via Senate (P86)
☐ sen-sg-03-VALD: Schema validation engine — validate artifact metadata against schemas (P85)
☐ sen-sg-04-MIGR: Auto-migration generation from approved schema changes (P83)
☐ sen-sg-05-SCOP: Domain scope enforcement — reject out-of-scope proposals (P82)
☐ sen-sg-06-PROC: Processing step lineage — track transforms in provenance chains (P87)

Dependency Chain

sen-sg-01-SREG (Schema registry)
    ↓
sen-sg-02-PROP (Proposal system) ──→ sen-sg-05-SCOP (Scope enforcement)
    ↓
sen-sg-03-VALD (Validation engine)
    ↓
sen-sg-04-MIGR (Auto-migration)

sen-sg-06-PROC (Processing steps) — parallel, integrates with provenance

Integration Points

  • Experiment Extraction (q-experiment-extraction): Experiment schemas are governed here
  • Artifact Debates (q-artifact-debates): Schema proposals can be debated
  • Senate Proposals (senate_proposals.py): Schema proposals use existing proposal infrastructure
  • Migration Runner (migration_runner.py): Approved changes generate migrations
  • Artifact Registry (artifact_registry.py): Validation enforced at registration time

Success Criteria

☑ Schema registry tracks all artifact type schemas with version history
☐ At least 3 schema proposals processed through governance (proposed → debated → decided)
☐ Validation catches >95% of schema violations at registration time
☐ No schema change applied without governance approval
☐ Processing step lineage captures the full transform chain for >50% of derived artifacts

Quality Requirements

> Scope criteria are enforced prospectively via the proposal system. Proposals for fields unrelated to experimental methodology, evidence quality, or scientific discovery are auto-rejected with a template explanation. Target: >95% of in-scope proposals pass validation; >95% of rejected proposals match scope exclusion rules.

Work Log

2026-04-06 — sen-sg-01-SREG complete [task:47b17cbf-a8ac-419e-9368-7a2669da25a8]

What was done:

  • Confirmed schema_registry table already existed in DB (from earlier bootstrapping).
Created migration 059_schema_registry.py to ensure it is created correctly on fresh
installs with proper indexes on (artifact_type, is_current) and (artifact_type, version).
  • Extended schema_registry.py:
- Fixed register_schema() to include id (UUID) and created_at in INSERT
(previously omitted, causing NOT NULL failures for new artifact types).
- Added import uuid dependency.
- Expanded DEFAULT_SCHEMAS from 5 types to 15 canonical artifact types:
hypothesis, paper, notebook, analysis, dashboard, authored_paper, figure, code,
model, protein_design, dataset, tabular_dataset, debate, kg_edge, wiki.
- All schemas use JSON Schema draft 2020-12 format with typed properties and
required constraints where semantically appropriate.
  • Seeded 9 new schemas into the live DB (6 already existed from earlier runs).
Registry now covers 18 artifact types with version history.
  • Added to api.py:
- GET /senate/schemas — Senate UI page listing all registered schemas with
field counts, required counts, compliance percentage, and version history links.
- GET /senate/schemas/{artifact_type} — Detail page with field table,
raw JSON Schema, and full version history.
- GET /api/senate/schemas — JSON API listing all current schemas.
- GET /api/senate/schemas/{artifact_type} — JSON API for one type.
- GET|POST /api/senate/schemas/seed — Idempotent seeding endpoint.
- Added "Schema Registry" link to Senate nav dropdown and quick-links section.

2026-04-16 — sen-sg-01-SREG verification [task:47b17cbf-a8ac-419e-9368-7a2669da25a8]

Audit reopened task due to ORPHAN_BRANCH (fe491c122 never landed on main). Verified
actual implementation on origin/main (commit 1092b29ac):

What landed on main:

  • GET /api/schemas — list all 18 artifact type schemas
  • GET /api/schemas/{artifact_type} — current schema with version history
  • migrations/062_schema_registry.py — idempotent CREATE TABLE
  • scripts/schema_registry.py — 700 lines with register/get/list/history/validate
  • DB: 22 rows in schema_registry (experiment v3, protein_design v3, others v1)
What did NOT land (fe491c122 orphan):
  • Senate HTML UI (/senate/schemas, /senate/schemas/{type})
  • /api/senate/schemas endpoints with /api/senate/schemas/seed
Acceptance criteria status: SATISFIED — schema_registry tracks all artifact type
schemas with version history. The simpler API-only implementation fulfills the core
requirement; the HTML UI is nice-to-have but non-blocking for dependent tasks.

Tasks using this spec (1)
[Senate] sen-sg-01-SREG: Schema registry — track artifact sc
File: quest_schema_governance_spec.md
Modified: 2026-04-24 07:15
Size: 6.9 KB