> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> A5 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.
Task ID: 466d13f2-ec31-44b6-8230-20eb239b6b3a Layer: Atlas Status: Done
536+ wiki entities had thin infobox_data (only auto-generated fields: name, type,
knowledge_graph_edges, top_relations, associated_diseases). Enrich the 20 most-connected
of these with structured data from the knowledge graph, canonical_entities, wiki_pages,
and the targets table.
knowledge_edges (700K+ edges) to compute per-entity connection countswiki_entities to find entities with thin infoboxes (<=5 filled fields)canonical_entities — aliases, external IDs (UniProt, NCBI, Ensembl), descriptionwiki_pages — definition paragraph extractionknowledge_edges — associated genes, diseases, drugs, pathways, cell types, interactions, brain regionstargets — protein name, mechanism, target class (for gene/protein entities)
db_writes.save_wiki_entity for journaled updatesenrichment/enrich_top20_connected_infoboxes.py — enrichment scriptknowledge_edges table has 700K+ edges for computing connection counts.
{
"_reset_note": "This task was reset after a database incident on 2026-04-17.\n\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\ncorruption. Some work done during Apr 16-17 may have been lost.\n\n**Before starting work:**\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\n\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\nSCIDEX_DB_BACKEND=postgres env var.",
"_reset_at": "2026-04-18T06:29:22.046013+00:00",
"_reset_from_status": "done"
}