[Senate] Artifact reuse, provenance, and quality vetting

← All Specs

Goal

Make the value of an artifact compound with use. Track who derives from
it, count reuse, surface dependents. Vet new artifacts via debate + QC
score so other analyses can confidently take dependencies on them.
Without this, every artifact is a one-shot orphan; with it, the system
accrues a library of reliable, reusable building blocks.

> ## Continuous-process anchor
>
> Multiple recurring processes here:
> - Reuse counter maintenance — trigger-based, not driver-based
> (synchronous on insert into artifact_links)
> - QC debate driver — recurring, gap-predicate qc_status='pending'
> - Vetting promotion — recurring, gap-predicate
> qc_status='in_review' AND consensus_reached
> - Stale artifact monitor — recurring, flags artifacts with
> conflicting evidence accumulated since last QC

All four follow docs/design/retired_scripts_patterns.md principles.

Why now

The user said: "we want artifacts that will get reused. they need to be
vetted. they need to be debated, iterated, qc'ed. then other analyses can
take dependencies on them. they need to be fully provenanced and
reproducible."

Current state has artifact_links (typed relationships) and provenance_chain (JSON), but no:

  • Reuse counter / dependent count
  • QC pipeline (debate-driven quality vetting)
  • Promotion gates (when can another analysis depend on this?)
  • Lifecycle telemetry (was this artifact's contradicting evidence
meaningfully responded to?)

Design

New columns on artifacts

ALTER TABLE artifacts
  ADD COLUMN parent_artifact_id TEXT REFERENCES artifacts(id),
  ADD COLUMN derived_from_ids TEXT[],
  ADD COLUMN reuse_count INT DEFAULT 0,
  ADD COLUMN dependent_count INT DEFAULT 0,
  ADD COLUMN last_reused_at TIMESTAMPTZ,
  ADD COLUMN qc_status TEXT DEFAULT 'pending'
    CHECK (qc_status IN ('pending', 'in_review', 'passed', 'failed', 'disputed', 'deprecated')),
  ADD COLUMN qc_score REAL,            -- 0-1, NULL until QC complete
  ADD COLUMN qc_completed_at TIMESTAMPTZ,
  ADD COLUMN vetted_by_actor_ids TEXT[],
  ADD COLUMN reproducibility_score REAL,  -- 0-1, populated by capsule verify
  ADD COLUMN debate_outcome TEXT,         -- 'consensus_pass', 'consensus_fail', 'unresolved'
  ADD COLUMN can_be_dependency BOOLEAN DEFAULT FALSE;  -- true once qc_status='passed'

CREATE INDEX idx_artifacts_qc_status ON artifacts(qc_status);
CREATE INDEX idx_artifacts_can_be_dependency ON artifacts(can_be_dependency) WHERE can_be_dependency;
CREATE INDEX idx_artifacts_parent ON artifacts(parent_artifact_id);
CREATE INDEX idx_artifacts_derived_from ON artifacts USING gin(derived_from_ids);

Reuse counter mechanics

Triggers on artifact_links:

CREATE OR REPLACE FUNCTION bump_reuse_counters() RETURNS TRIGGER AS $$
BEGIN
  IF NEW.link_type IN ('derives_from', 'cites', 'extends') THEN
    UPDATE artifacts
       SET dependent_count = dependent_count + 1,
           reuse_count = reuse_count + 1,
           last_reused_at = NOW()
     WHERE id = NEW.target_artifact_id;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_artifact_links_reuse_bump
  AFTER INSERT ON artifact_links
  FOR EACH ROW EXECUTE FUNCTION bump_reuse_counters();

Reuse counter is append-only — deletes don't decrement (keeps
historical reuse intact). Use a separate view if "currently dependent"
matters more than "ever reused".

QC pipeline

When an artifact is committed:

  • qc_status defaults to 'pending'
  • Recurring "QC debate driver" picks up pending artifacts at quality_score
  • ≥ 0.5 (don't waste cycles on obvious junk)
  • Driver enrolls the artifact in a 3-round debate (Theorist / Skeptic /
  • Methodologist) using the existing quest_artifact_debates_spec.md
    infrastructure
  • After debate concludes, an LLM judge produces a structured QC report:
  • {
      "qc_score": 0.82,
      "reproducibility_assessment": "manifest present, dependencies pinned, executable",
      "methodological_concerns": ["sample size n=3 limits statistical power"],
      "factual_errors": [],
      "novelty_assessment": "extends prior figure-XYZ but adds clinical context",
      "vetted_by": "agent-skeptic-glm-4.5",
      "promotion_recommendation": "pass_with_caveats",
      "debate_session_id": "..."
    }

  • If qc_score >= 0.75 and no factual errors: status → passed,
  • can_be_dependency = TRUE
  • If qc_score < 0.5: status → failed, can_be_dependency = FALSE,
  • reasoning logged
  • If reviewers disagree: status → disputed, route to second-pass
  • with different reviewers

    Configurable threshold: stored in qc_config table, adjustable via
    weekly meta-job that calibrates against ground-truth-passed artifacts.

    Provenance graph

    Hard schema (versus today's JSON):

    • parent_artifact_id for single-source derivations (most common)
    • derived_from_ids UUID[] for multi-source (an analysis derived from
    3 datasets + 1 prior figure)
    • Symlinks under <id>/inputs/ mirror DB (Phase 1 of folder migration)
    • artifact_links table remains for typed relationships beyond simple
    derivation (cites, contradicts, supports, etc.)

    Promotion / demotion mechanics

    can_be_dependency is the gate. Only artifacts with this flag set can be
    referenced as parent_artifact_id or in derived_from_ids of new
    work. Enforced at commit_artifact() time:

    def commit_artifact(..., parent_artifact_id=None, derived_from_ids=None):
        if parent_artifact_id:
            row = conn.execute("SELECT can_be_dependency FROM artifacts WHERE id=%s",
                               (parent_artifact_id,)).fetchone()
            if not row or not row['can_be_dependency']:
                raise ArtifactNotPromoted(parent_artifact_id)
        # similar check for each entry in derived_from_ids
        ...

    This forces agents to either:

  • Wait for the artifact they want to depend on to pass QC, or
  • Use unvetted_dependencies field (separate column) for exploratory
  • work that's clearly marked "not for production"

    Demotion: if new evidence contradicts a passed artifact, recurring
    "Stale artifact monitor" (every-12h) re-opens QC by setting status
    back to 'in_review'. Cascades to dependents:

    WITH RECURSIVE downstream AS (
      SELECT id FROM artifacts WHERE id = $1
      UNION
      SELECT a.id FROM artifacts a
        JOIN downstream d ON a.parent_artifact_id = d.id
                           OR d.id = ANY(a.derived_from_ids)
    )
    UPDATE artifacts SET qc_status='in_review' WHERE id IN (SELECT id FROM downstream);

    Notify all dependent artifact owners (via Senate event bus).

    Reproducibility binding

    Artifacts produced inside a reproducibility capsule
    (quest_reproducible_analysis_capsules_spec.md) auto-populate reproducibility_score from the capsule's verification_result.
    Capsule verify score ≥ 0.9 → reproducibility_score = capsule_score.
    Non-capsule artifacts default to NULL; QC debate considers the absence.

    Recurring drivers (4)

  • [Senate] CI: QC debate driver (every-2h, pri 91)
  • - Predicate: qc_status='pending' AND quality_score >= 0.5
    - Batch: 20/cycle
    - Action: enroll in debate, wait, judge, update status

  • [Senate] CI: QC promotion settler (every-1h, pri 91)
  • - Predicate: qc_status='in_review' AND debate_completed_at < NOW() - interval '2h'
    - Batch: 50/cycle
    - Action: aggregate debate outcomes, set status, flip can_be_dependency

  • [Senate] CI: Stale artifact monitor (every-12h, pri 90)
  • - Predicate: qc_status='passed' AND last_reused_at IS NOT NULL
    AND <new contradicting evidence since qc_completed_at>

    - Batch: 30/cycle
    - Action: flip status to in_review, cascade to dependents

  • [Senate] CI: Reuse counter audit (daily, pri 89)
  • - Predicate: count actual links vs dependent_count
    - Action: reconcile mismatches; alert if drift > 5%

    Surfaces

    Artifact detail page additions:

    • "Dependents (N)" expandable section listing artifacts that derive from this
    • "Provenance graph" mini-viz (parent → this → children)
    • "QC report" tab with debate transcript + LLM judge output
    • "Quality timeline" showing qc_status changes over time
    API additions:
    • GET /api/artifacts/<id>/dependents — list artifacts that derive
    • GET /api/artifacts/<id>/qc-report — latest QC outcome
    • POST /api/artifacts/<id>/dispute — submit QC dispute (reviewer auth)
    • GET /api/artifacts/most-reused?layer=Atlas&since=7d — leaderboard

    Acceptance criteria

    ☐ Schema applied; reuse triggers active
    ☐ All 4 recurring drivers running at scheduled cadence
    ☐ 80% of new artifacts reach a final QC status (passed/failed) within 48h
    ☐ No artifact with qc_status='passed' has uncited factual errors
    flagged by spot-check (sample 50 per quarter)
    ☐ Reuse counters within 5% of ground truth (audit driver verifies)
    ☐ Demotion cascade works: contradicting evidence flips dependent
    artifacts' status within 12h
    ☐ Surfaces: dependents list, provenance graph, QC report tab live
    ☐ Operator dashboard: QC throughput, pending count, dispute queue

    Dependencies

    • quest_artifact_uuid_migration_spec.md (uses artifacts.id as canonical handle)
    • quest_artifact_metadata_semantic_spec.md (summary used in QC prompts)
    • quest_artifact_debates_spec.md (debate engine)
    • quest_reproducible_analysis_capsules_spec.md (capsule verify scores)
    • Existing actors, actor_reputation tables (for vetted_by_actor_ids)

    Dependents

    • quest_experiment_execution_participant_spec.md (results submitted as
    artifacts go through this pipeline)
    • quest_paper_replication_starter_spec.md (replications are artifacts
    needing QC before they unlock further extensions)

    Work Log

    2026-04-28 — Spec authored

    Designed reuse counters (trigger-based, append-only), QC pipeline
    (3-round debate + LLM judge → status), promotion gate
    (can_be_dependency), and demotion cascade. Four recurring drivers
    identified. Schema additive to existing artifacts table; orthogonal
    to the artifact folder migration but keys on artifacts.id.

    Open: how to handle the cold-start for ~11k existing artifacts —
    one-shot QC backfill is huge LLM cost. Per user direction (drained
    fleet window): run QC during the same cycle as the folder backfill,
    not lazily. Stop deferring; ship through it now while the system is
    quiet.

    Old proposal kept for reference: only run QC on
    artifacts that are reused at least once OR that an analysis explicitly
    requests as a dependency. Lazy QC. Document this in Phase 2 work.

    Tasks using this spec (1)
    [Senate] CI: Artifact QC debate driver
    Senate open P91
    File: quest_artifact_reuse_provenance_qc_spec.md
    Modified: 2026-04-28 04:17
    Size: 10.5 KB