[Atlas] Cross-link hypotheses to wiki pages via target genes

← All Specs

[Atlas] Cross-link hypotheses to wiki pages via target genes

ID: 353da173-093 Priority: 88 Type: one_shot Status: open

Goal

Create artifact_links between hypotheses and wiki pages for their target genes. This enables the 'Wiki Pages' section on hypothesis detail pages. Currently hypothesis pages show 'No linked wiki pages' even when relevant wiki content exists.

Acceptance Criteria

☐ Concrete deliverables created
☐ Work log updated with timestamped entry

Work Log

2026-04-20 17:47 UTC — PostgreSQL compatibility fix + migration run

Problem: Migration 098 on origin/main used SQLite-specific code (PRAGMA journal_mode=WAL, sqlite3.Row row factory, ? placeholders) that fails on PostgreSQL.

Fix: Updated migrations/098_crosslink_hypotheses_target_genes.py:

  • Removed PRAGMA journal_mode=WAL (PostgreSQL has WAL naturally)
  • Removed sqlite3.Row row factory (PGShimConnection already sets _pg_row_factory)
  • Replaced ? placeholders with %s
  • Replaced INSERT OR IGNORE with INSERT ... ON CONFLICT DO NOTHING
  • Removed cursor = conn.cursor() pattern; use conn directly
Results: Created 1,328 new cross-links (805 wiki→hypothesis + 523 wiki→analysis):
  • Before: 199/581 hypotheses with wiki links
  • After: 524/581 hypotheses with wiki links
  • Remaining hypotheses: those whose parsed genes don't have wiki pages (e.g., lncRNA-9969, non-gene descriptors)
Verification: SELECT COUNT(DISTINCT h.id) FROM hypotheses h JOIN artifact_links al ON al.target_artifact_id = 'hypothesis-' || h.id WHERE al.source_artifact_id LIKE 'wiki-' || '%' → 524

2026-04-16 — Enhanced cross-linking migration

Problem: Migration 033 (crosslink_wiki_hypotheses) used strict exact-match on LOWER(target_gene) and only handled single-gene targets. 137 of 521 hypotheses with target genes were missing wiki links due to:

  • Multi-gene targets (comma-separated: IL10, CSF1R, CD40)
  • Slash-separated complexes (PIKFYVE/MCOLN1/PPP3CB/TFEB)
  • Plus-separated (ABCA7 + TREM2)
  • Dash-separated gene pairs (CSF1R-TREM2)
  • Parenthetical aliases (MFSD2A (SLC59A1))
  • Descriptive qualifiers (CLU-APOE-TREM2 axis (LXR/RXR pathway))
Solution: Created migrations/098_crosslink_hypotheses_target_genes.py with enhanced gene parser that:
  • Splits on comma, slash, plus, and dash separators
  • Strips parenthetical descriptions and trailing qualifiers
  • Matches wiki pages under genes-, proteins-, and entities-* slug prefixes
Results: Created 292 new artifact_links (282 wiki→hypothesis + 10 wiki→analysis).
  • Before: 384/521 hypotheses with wiki links (74%)
  • After: 504/521 hypotheses with wiki links (97%)
  • Remaining 17: genes without wiki pages (HCAR2, GZMB, PADI4, etc.) or non-gene targets ("MULTIPLE", "COMPOSITE_BIOMARKER")

Tasks using this spec (1)
[Atlas] Cross-link hypotheses to wiki pages via target genes
Atlas done P88
File: 353da173_093_spec.md
Modified: 2026-04-25 23:40
Size: 2.8 KB