[Atlas] Wiki mermaid LLM regeneration — fix the 1,070-diagram backlog

← All Specs

[Atlas] Wiki mermaid LLM regeneration — fix the 1,070-diagram backlog

Task

  • Type: recurring every-6h until backlog cleared, then weekly
  • Layer: Atlas
  • Depends on: scripts/validate_mermaid.py (deployed)

Goal

Fix the ~1,070 wiki_pages.content_md mermaid blocks whose node IDs are mangled (e.g. ATREM2 L["igands"] should render as TREM2["Ligands"]). These patterns cannot be safely fixed with regex — the semantic content has been corrupted in ways that require LLM regeneration from the surrounding page text.

Why now

After the regex sweep fixed hypotheses + wiki_entities (0 failures across 3,683 blocks), wiki_pages remains the last large source of visible mermaid errors on the site. Every page with a bad diagram either renders "syntax error in text" or hides the diagram block, visibly degrading the wiki.

What it does

  • Runs python3 scripts/validate_mermaid.py --table wiki_pages --json to enumerate failing pages
  • Sorts failures by page importance (citation count, incoming wiki links, recent edit traffic)
  • Spawns 5 parallel agents, each taking a disjoint slice of the top-50 failures per run
  • For each failing page, the sub-agent:
1. Reads the page's prose around the broken diagram
2. Generates a new mermaid diagram that reflects the same biological/conceptual flow using correct syntax
3. Validates the new diagram parses via validate_mermaid.py before committing
4. Updates wiki_pages.content_md in a worktree branch
  • Worktree branches merged back via orchestra sync push — one commit per 5–10 repaired pages
  • Tracks progress in docs/bio_competitive/wiki_mermaid_repair_progress.md (rolling counter + date)

Success criteria

  • Phase 1 (first run): regenerate 50 failing pages, 0 parse failures among them
  • Phase 2 (2-week target): wiki_pages parse failures <100 across full DB
  • Phase 3 (steady-state): <10 active failures; new pages auto-validated via pre-push hook
  • Each regenerated diagram must cite the page's existing content — no fabricated relationships
  • No diagram regressions in previously-clean pages (re-run validator on touched pages)

Quality requirements

  • Reference quest_quality_standards_spec.md
  • No stub diagrams: min 4 nodes, min 1 labeled edge, nodes must name real entities from page prose
  • Parallel-agent execution mandatory (≥10 items per run); sequential fallback only with --single debug flag
  • Log processed / retry / regression counts per run so the governance dashboard can detect busywork

Risks + mitigations

  • Fabrication: agent invents relationships not in the page text → require ≥2 citations; reject otherwise
  • Regression: breaking previously-good diagrams → re-validate full wiki_pages table after every 10-page batch
  • Cost: 1,070 pages × parallel LLM calls → cap at 50 pages/6h run (~420K tokens/run)

Related

  • scripts/validate_mermaid.py (validator)
  • tools/git-hooks/pre-push (prevents new regressions)
  • b399cd3e_wiki_quality_pipeline_spec.md (parent quality pipeline)
  • quest_quality_standards_spec.md (meta-quest)

Work Log

2026-04-28 — Run by task:33a9825b

Finding: All 67 reported failures were false positives. The extract_blocks() function's raw-block extractor matched flowchart/graph text inside ``mermaid fenced blocks, producing partial/truncated copies that failed parse validation. Every one of the 150 fenced mermaid blocks in the 57 "failing" pages was actually valid when tested directly.

Fix: Added fenced-region exclusion to the raw block extractor in scripts/validate_mermaid.py. The fix precomputes fenced block byte ranges and skips any raw match whose start position falls within a fenced region.

Result: wiki_pages failures 67→0; total extracted blocks 37,211→18,645 (duplicates eliminated). Committed c5e989dfb`.

Status: Phase 3 (steady-state) reached — 0 active failures. The recurring task should remain to catch any regressions introduced by future wiki page writes.

Tasks using this spec (1)
[Atlas] Wiki mermaid LLM regen — 50 pages/run, parallel agen
Atlas open P92
File: wiki_mermaid_llm_repair_spec.md
Modified: 2026-04-28 02:48
Size: 3.9 KB