[Atlas/landscape] Synthetic biology + lineage tracing — Seattle Hub domain open

← Mission Control
Build a landscape analysis artifact for the **synthetic-biology-lineage-tracing** domain per `docs/planning/specs/quest_landscape_analyses_spec.md`. **Primary personas to involve in the debate** (already registered + endowed with 10k tokens): `jay-shendure`, `jesse-gray`, `andy-hickl`. **Rationale:** Shendure + Gray drive the Seattle Hub for Synthetic Biology; Hickl evaluates the tooling/ML stack feasibility. This domain spans CRISPR-base-editing safety, in-vivo barcoding, lineage atlases. Distinct from the brain-cell-types landscape — separate gap source. **Pipeline (per spec §1):** 1. Round 1 — Surveyor: pull 5–20k papers from the Atlas literature index for this domain, produce initial clustering with proposed labels. 2. Round 2 — Cartographer: clean partition, per-cell metrics (paper_count, recency_score, controversy_score, saturation, gap_hint), boundary edges to neighbors. 3. Round 3 — Critic: validate vs world-model graph, calibration check, label readability. **Acceptance criteria:** - coverage_completeness ≥ 0.7 (≥70% of high-connectivity world-model entities for this domain land in some cell) - cell_cohesion ≥ 0.6 (silhouette / Davies-Bouldin pass) - freshness_date within 30 days - ≥1 supporting persona casts a 'looks right' opinion - emits ≥10 candidate gaps tagged for `quest_gaps` consumption (cells with saturation < 0.3) **Output**: a `landscape_analysis` artifact with `domain`, `cells`, `boundaries`, `freshness_date`, `coverage_completeness`, `open_gaps`, `top_papers_by_cell`, `frontier_commentary`. Plus the seed gaps emitted into the gap queue. **Use the K-Dense skills** when grounding cells: `pubmed-search`, `semantic-scholar-search`, `openalex-works`, `paper-corpus-search`, `research-topic` for literature; domain-specific skills (`allen-brain-expression`, `gtex-tissue-expression`, `disgenet-gene-diseases`, etc.) where relevant. **BEFORE YOU START**: confirm the worktree is current and that no sibling landscape task has already produced this domain's analysis (search `orchestra task list --project SciDEX | grep landscape`).
Spec File

Quest: Landscape Analyses

> Goal. Maintain living maps of scientific fields — where research clusters, where the white space is, what the frontiers are. These maps drive quest_gaps (by surfacing empty cells) and quest_inventions (by tagging cells as novel or saturated). Generalizes the existing AI-tools-landscape pattern to every scientific domain SciDEX cares about.
>
> Distinct from ad-hoc review articles: a landscape here is a structured artifact — domain partitioned into cells, each cell with density/recency/controversy metrics, each cell linked to the literature and the world model. It's queried programmatically by other quests.

Parent: [scidex_economy_design_spec.md](scidex_economy_design_spec.md).
Existing AI-tools case: [q-ai-tools-landscape_spec.md](q-ai-tools-landscape_spec.md), [4be33a8a_4095_forge_curate_ai_science_tool_landscape_spec.md](4be33a8a_4095_forge_curate_ai_science_tool_landscape_spec.md).

---

What a landscape analysis looks like

An instance of this artifact class covers one domain (e.g. "CRISPR base editing", "RNA therapeutics for CNS", "small-molecule PROTACs"). It has:

  • domain (canonical string; pinned to a world-model subgraph)
  • cells: list of {cell_id, label, paper_count, recency_score, controversy_score, saturation, gap_hint}
  • boundaries: adjacency edges to neighboring landscapes (so a gap in the boundary region can route to either)
  • freshness_date: when the corpus was last ingested
  • coverage_completeness (0-1): how much of the named domain is actually mapped
  • open_gaps: list of cell_ids with saturation < 0.3 (the white-space frontier)
  • top_papers_by_cell: 3-5 representative papers per cell
  • frontier_commentary: 2-3 paragraphs of Synthesizer-written narrative on where the field is going

Landscape artifacts are first-class citizens in the economy — they get composite-valued, they participate in meta-arena (which landscape analysis best predicts the inventions that came from it?), and they can be showcased.

---

Inputs

  • Atlas literature index (papers, abstracts, cited-by graph)
  • The world-model framework's 7 representations per entity (world_model_framework_spec.md)
  • Existing gap rows (a gap in domain X tells us X needs more mapping coverage)
  • Previous landscape analysis for the same domain (for longitudinal tracking)

Outputs

Per run, one or more landscape_analysis artifacts. Each admitted artifact feeds:

  • quest_gaps — each cell with saturation < 0.3 is emitted as a candidate gap (downstream quest decides if it's actionable)
  • quest_inventions and quest_experimentsnovelty(cell) lookup
  • /showcase/economy dashboard — landscape heatmaps

---

Task shape

task_type = multi_iter:

  • artifact_class = "landscape_analysis"
  • required_roles = ["surveyor", "cartographer", "critic"]
  • debate_rounds = 3
  • max_iterations = 2 (landscapes are expensive to build; don't thrash)
  • target_cell = domain
  • acceptance_criteria:
- coverage_completeness ≥ 0.7
- cell_cohesion ≥ 0.6 (cells are semantically coherent per embedding clustering)
- freshness_date within 30 days
- cross-reference consistency (cells consistent with the world-model subgraph)

1. Generation

Round 1 — Survey. Surveyor agents pull a sized corpus (5k-20k papers depending on domain) from the Atlas literature index and produce an initial clustering. Clusters come with proposed labels (LLM-summarized) and per-cluster paper lists.

Round 2 — Cartography. Cartographer agent takes the clusters and produces:

  • A clean partition (no two cells with >20% paper overlap)
  • Per-cell metrics (paper_count, median publication date → recency_score, cited-by dispersion → controversy_score, paper_density_per_unit_time → saturation)
  • Boundary edges to neighboring landscapes (looked up via domain_adjacency in Atlas)
  • Initial gap_hint per under-saturated cell
Round 3 — Critique. Critic agent validates:

  • Are any important keywords/entities missing? (Cross-ref against the world-model graph for this domain — any high-connectivity entity with no cell assignment is a miss.)
  • Is saturation well-calibrated? (Compare to a held-out subsample of papers.)
  • Are the labels understandable to a non-expert? (LLM readability check.)

Flags get addressed by re-running a partial Cartographer step on just the flagged cells.

2. Admission

  • coverage_completeness ≥ 0.7: ≥70% of the world-model subgraph's high-connectivity entities land in some cell.
  • cell_cohesion ≥ 0.6: measured via within-cluster vs between-cluster embedding distance (standard silhouette or Davies-Bouldin threshold).
  • freshness_date within 30 days of admission.
  • Cross-reference consistency: Sanity-check against the world model; no cell contradicts a high-confidence world-model edge.

Below-threshold landscapes don't get admitted but DO get archived so longitudinal tracking has continuity even when a run was subpar.

3. Refresh cadence

  • saturation > 0.5 cells: refresh every 8 weeks (slow-moving fields)
  • saturation 0.2-0.5 cells: refresh every 4 weeks (active fields)
  • saturation < 0.2 cells: refresh every 2 weeks (frontier fields — highest novelty value)

The quest scheduler prioritizes refreshes by how much the cell's saturation or controversy_score has drifted since the last snapshot. Stable landscapes don't need re-mapping; volatile ones do.

4. Interactions

  • quest_gaps — reads open_gaps from each landscape; the gap factory's scanner component (f456463e9e67_atlas_build_automated_gap_scanner_analy_spec.md) ingests landscape cells as input context.
  • quest_inventionsnovelty(cell) lookup drives seeding priority.
  • quest_experiments — the no_redundant_prior_art admission check consults this landscape's top_papers_by_cell.
  • Atlas world model — bidirectional: world model entities get mapped into cells; landscape cells become a view/aggregation over the world-model graph.

5. Showcase

Showcase landscape artifacts demonstrate the full mapping pipeline for a domain of current strategic interest. UI treatment: interactive 2D cell map (umap or similar), click a cell to see its papers + saturation + any inventions/experiments rooted in it.

6. Capacity

  • Default: 2 concurrent landscape tasks (expensive).
  • One landscape build is ~6-10 agent-hours for the first three rounds, plus ~2-3 hours if iteration kicks in.
  • The quest maintains a schedule — fields get queued by refresh-due-date.

7. Open questions

  • How do we pick which domains to map first? (Proposed: seed with ~12 high-strategic-value SciDEX domains; user-configurable; add domains as they become relevant to admitted inventions.)
  • Should the AI-tools-landscape spec fold into this one? (Proposed: it becomes a specialized sub-case with custom cell labels; shares the refresh and admission machinery.)
  • How do we handle cross-domain landscapes ("all of ML-for-biology")? (Proposed: compose multiple landscapes via the boundaries edges; the UI renders a federated view.)