Make every artifact semantically discoverable. Each artifact carries a
short LLM-generated summary, structured metadata (key findings,
methods, data sources, applicable domains), and a vector embedding of
the summary. A semantic-search endpoint and "find similar" surfaces
make artifacts genuinely reusable rather than orphaned per-task outputs.
The current state has metadata as a free-form JSONB column with
type-specific schemas (figures have caption + source_notebook_id,
models have model_family + framework, etc.). What's missing is a
uniform, semantic, queryable layer on top: every artifact, regardless
of type, has a summary you can search against.
> ## Continuous-process anchor
>
> Steady-state: a recurring driver finds artifacts with
> summary_version < current_rubric_version, runs the rubric,
> upserts the summary + embedding, and stamps the version. Every
> rubric improvement triggers a re-summary pass. See
> docs/design/retired_scripts_patterns.md § "1. LLMs for semantic
> judgment; rules for syntactic validation" — this spec is a textbook
> case.
artifact.id is theartifactsALTER TABLE artifacts
ADD COLUMN summary TEXT,
ADD COLUMN summary_embedding vector(1536), -- pgvector
ADD COLUMN key_findings JSONB, -- ["finding 1", "finding 2", ...]
ADD COLUMN methods_used TEXT[], -- ['RNA-seq', 'CRISPR-screen']
ADD COLUMN data_sources TEXT[], -- ['Allen-SEA-AD', 'GTEx-v10']
ADD COLUMN applicable_domains TEXT[], -- ['alzheimers', 'microglia']
ADD COLUMN semantic_keywords TEXT[], -- ['heatmap', 'differential-expression']
ADD COLUMN summary_generated_at TIMESTAMPTZ,
ADD COLUMN summary_model TEXT, -- 'claude-opus-4-7' / 'codex-...'
ADD COLUMN summary_rubric_version INT;
CREATE INDEX idx_artifacts_summary_embedding
ON artifacts USING ivfflat (summary_embedding vector_cosine_ops);
CREATE INDEX idx_artifacts_methods_used ON artifacts USING gin(methods_used);
CREATE INDEX idx_artifacts_data_sources ON artifacts USING gin(data_sources);
CREATE INDEX idx_artifacts_applicable_domains ON artifacts USING gin(applicable_domains);pgvector dependency: confirm extension is installed
(CREATE EXTENSION IF NOT EXISTS vector). If not present, the migration
installs it; the recurring driver becomes a no-op until vector is
available.
Stored as a versioned PG row in a new artifact_summary_rubric table:
CREATE TABLE artifact_summary_rubric (
version INT PRIMARY KEY,
prompt_template TEXT NOT NULL,
output_schema_json JSONB NOT NULL,
embedding_model TEXT NOT NULL, -- e.g. 'claude-opus-4-7-embed' (or third-party)
retired BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT NOW(),
notes TEXT
);Initial rubric (v1):
> Read the artifact's title, type, and metadata. If it's a figure or
> notebook, also read up to 200 lines of associated text/captions/cells.
> Produce:
> 1. A 1-3 sentence summary capturing what the artifact contains and
> why it might matter to a researcher.
> 2. 3-5 key findings (bullet phrases, each ≤15 words).
> 3. Methods used (controlled vocabulary; see methods_taxonomy).
> 4. Data sources (controlled vocabulary; see data_sources_taxonomy).
> 5. Applicable disease/biology domains.
> 6. 5-10 semantic keywords for retrieval (free-form, lowercase).
> 7. Confidence (0-1) in your summary's faithfulness to the artifact.
>
> Be honest about what's actually in the artifact. If it's a stub,
> say so. If you can't tell what it does, say so. Do not embellish.
The rubric self-improves: a meta-task ("audit 50 random rubric_v1
outputs against ground-truth artifacts; propose rubric_v2 changes")
runs weekly. Operators approve rubric upgrades.
Two new tables, populated by LLM-driven discovery (NOT hardcoded):
CREATE TABLE methods_taxonomy (
term TEXT PRIMARY KEY,
category TEXT, -- 'wet-lab', 'computational', 'imaging', ...
parent_term TEXT REFERENCES methods_taxonomy(term),
synonyms TEXT[],
first_seen_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE data_sources_taxonomy (
source_id TEXT PRIMARY KEY, -- 'Allen-SEA-AD', 'GTEx-v10'
display_name TEXT NOT NULL,
category TEXT, -- 'transcriptomics', 'imaging', 'clinical'
url TEXT,
license TEXT
);Bootstrap: a one-shot seeds each table with 30-50 obvious entries
(LLM-generated from sampled artifacts, operator-approved). Steady
state: the rubric driver proposes new terms when it can't fit an
artifact's method to existing entries; weekly meta-job consolidates.
scripts/artifact_summary_backfill.py — but per the continuous-process
principles, this is wired as a recurring CI task with a Codex agent
reading the spec, not a standalone script.
Gap predicate:
SELECT id, artifact_type, title, metadata
FROM artifacts
WHERE summary IS NULL
OR summary_rubric_version < (SELECT MAX(version) FROM artifact_summary_rubric WHERE NOT retired)
ORDER BY quality_score DESC NULLS LAST,
created_at DESC
LIMIT 50;Priority order: high-quality artifacts first (already vetted),
newest first within tier.
Per-artifact algorithm:
quest_llm_routing_spec.md)output_schema_jsonUPDATE artifacts SET
summary = ?, key_findings = ?, methods_used = ?,
data_sources = ?, applicable_domains = ?, semantic_keywords = ?,
summary_embedding = ?, summary_generated_at = NOW(),
summary_model = ?, summary_rubric_version = ?
WHERE id = ?artifact_summary_runs (count per cycle, cost, errors)Bounded batch: 50 artifacts/cycle, every-2h.
Failure modes:
POST /api/artifacts/search
{
"query": "heatmap microglial gene expression Alzheimer",
"filters": {
"artifact_type": ["figure", "notebook"],
"methods_used": ["differential-expression"],
"min_quality_score": 0.6,
"applicable_domains": ["alzheimers"]
},
"limit": 25
}Response:
{
"results": [
{
"id": "...",
"id": "figure-abc123",
"title": "...",
"summary": "...",
"similarity": 0.847,
"key_findings": [...],
"url": "/artifact/<id>"
},
...
]
}Implementation: cosine similarity over summary_embedding filtered by
the structured filters. Returns top-N ranked by similarity.
Companion endpoint: GET /api/artifacts/<id>/similar?limit=10
returns artifacts with embedding similarity > 0.75 to this one.
On every artifact detail page (Phase 1 of folder migration adds this):
a "Similar artifacts" sidebar. Useful for reuse — a researcher viewing
an AD heatmap immediately sees related heatmaps they could derive from.
Some metadata fields are extractable without LLM:
.ipynb cells; extract import statements →methods_used candidates (e.g. import scanpy → 'single-cell-analysis')
.schema.json → column names + types as featuresiTXtartifact_summary_rubric v1 row insertedmethods_taxonomy and data_sources_taxonomy seeded with ≥30 entries eachquality_score ≥ 0.6 have summary within 14 daysquest_artifact_uuid_migration_spec.md Phase 0 (uses artifacts.id; id values are UUIDs for new artifacts)quest_llm_routing_spec.md for provider selectionquest_artifact_reuse_provenance_qc_spec.md (uses summary in reuse signals)quest_paper_replication_starter_spec.md (semantic dedup of replication attempts)Bootstrap design. Versioned LLM rubric for summaries; pgvector for
embeddings; structured taxonomies for filterable facets. Recurring
driver pattern (50/cycle every-2h). Search + similar APIs designed
but not implemented.
Open question: should embeddings be 1536-dim (OpenAI/Anthropic style)
or smaller? Storage at 11K artifacts is trivial either way; pick based
on retrieval quality benchmark.
{
"requirements": {
"reasoning": 8,
"analysis": 8
}
}