Quest: Wiki Inline Citations & Rich References

← All Specs

Quest: Wiki Inline Citations & Rich References

Layer: Atlas Priority: P85 Related quests: external_refs_quest_spec.md — this quest covers PubMed/DOI citations via refs_json + [@key]; external references to Reactome pathways, UniProt entries, WikiPathways, Wikipedia, ClinicalTrials.gov, PDB, AlphaFold and arXiv/bioRxiv now flow through the new external_refs table. The two systems live side by side (paper citations keep using [@key]; non-paper refs use [ext:...]).

Status: active

Problem Statement

SciDEX wiki pages have strong structural foundations — infoboxes, mermaid diagrams, knowledge graph cross-links, papers in the sidebar — but the core text body has no inline citations. A reader looking at FOXP1's speech disorder association, or FOXP2's language-gene role, cannot tell which paper supports which claim. This makes pages feel like summaries rather than scientific resources.

The citation infrastructure already exists ([@key] marker syntax, refs_json field, hover tooltip JS) but is almost entirely unused — the vast majority of pages have refs_json = NULL or sparse data, and no [@key] markers in the content. The tooltip currently shows only authors (year) — too thin for scientific use.

From user feedback: "Although there are some helpful citations at the bottom of the page, there are no in-line references on the page to connect various statements about the gene to the references supporting those findings... Having the in-line reference linkouts would make this resource immediately useful for getting a good idea from the summary and then being able to dive directly into the literature supporting it."

Vision

Every factual claim on every SciDEX wiki page is:

  • Marked with an inline citation [@key] that renders as a hoverable [N] superscript
  • Hovering shows a rich card: title, authors, year, journal, key claim being supported, and optionally a figure reference or quoted excerpt
  • Clicking jumps to the numbered reference at the bottom which links to PubMed/DOI
  • refs_json is the authoritative source — it stores all citation metadata including claim context
  • Background agents continuously scan pages, match papers to claims, and add citations
  • Phase 1: Enrich the Citation System (Infrastructure)

    1a. Richer refs_json Schema

    Current schema per citation key:

    {
      "bacon2020": {
        "authors": "Bacon C, et al",
        "title": "...",
        "journal": "...",
        "year": 2020,
        "pmid": "32028028"
      }
    }

    Target schema — add claim context that makes hover popovers scientifically useful:

    {
      "bacon2020": {
        "authors": "Bacon C, et al",
        "title": "The autism and schizophrenia associated gene FOXP1 is required for perinatal breathing and survival",
        "journal": "Respiratory Physiology & Neurobiology",
        "year": 2020,
        "pmid": "32028028",
        "doi": "10.1016/j.resp.2020.103400",
        "claim": "FOXP1 is required for perinatal respiratory control; loss causes breathing failure",
        "excerpt": "Foxp1 conditional knockout mice died within hours of birth due to respiratory failure, establishing a critical role in brainstem breathing circuits",
        "figure_ref": "Fig. 3",
        "strength": 0.92
      }
    }

    New fields:

    • claim: One sentence: what this paper specifically supports on this page
    • excerpt: Key quote or result from the paper (≤150 chars)
    • figure_ref: e.g., "Fig. 3", "Table 2" — most relevant figure
    • strength: 0–1 confidence this citation supports the claim

    1b. Rich Hover Popover (api.py)

    Current tooltip is a plain text title attribute showing authors (year) (~40 chars). Replace with a positioned <div> popover showing:

    ┌─────────────────────────────────────────────────────┐
    │ [1] Bacon et al. (2020)                             │
    │                                                     │
    │ FOXP1 is required for perinatal breathing           │
    │ and survival                                        │
    │                                                     │
    │ Claim: FOXP1 loss causes brainstem respiratory      │
    │ circuit failure → perinatal death in mice           │
    │                                                     │
    │ "Foxp1 cKO mice died within hours of birth          │
    │  due to respiratory failure" — Fig. 3               │
    │                                                     │
    │ Resp. Physiol. Neurobiol. 2020                 [→]  │
    └─────────────────────────────────────────────────────┘

    Implementation: inline JS in the wiki page renderer (api.py), replacing the current 10-line mouseover handler with a rich popover div that reads claim, excerpt, figure_ref from refs object.

    The popover should:

    • Appear on hover of <a class="ref-link">
    • Show above or below depending on viewport position
    • Fade in (100ms transition)
    • Include a [→] link to PubMed/DOI in the bottom-right
    • Disappear on mouseout with 200ms delay (so user can move onto the popover)
    • Be keyboard-accessible (focus triggers popover)

    1c. wiki_citations Table (Optional Enhancement)

    For large-scale citation tracking, consider a normalized table alongside refs_json:

    CREATE TABLE wiki_citations (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        wiki_slug TEXT NOT NULL,
        ref_key TEXT NOT NULL,           -- matches refs_json key
        pmid TEXT,
        doi TEXT,
        claim TEXT,                      -- what this citation supports on this page
        excerpt TEXT,                    -- key quote from paper
        figure_ref TEXT,                 -- e.g. "Fig. 3"
        strength REAL DEFAULT 0.8,
        position_hint TEXT,              -- section where this citation appears
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        UNIQUE(wiki_slug, ref_key),
        FOREIGN KEY(wiki_slug) REFERENCES wiki_pages(slug)
    );
    CREATE INDEX idx_wiki_citations_slug ON wiki_citations(wiki_slug);
    CREATE INDEX idx_wiki_citations_pmid ON wiki_citations(pmid);

    This enables: citation coverage queries, finding under-cited pages, paper→wiki backlinks.

    Phase 2: FOXP1 and FOXP2 Pilot Pages

    These two genes are chosen as the pilot because:

  • FOXP1 is under-appreciated for its speech/language role vs its better-known paralog FOXP2
  • Both have refs_json already populated but zero inline markers
  • They're paralogs — demonstrating citation patterns on both shows how related pages interconnect
  • FOXP2 is the canonical "language gene" with rich literature — ideal for showcasing the system
  • FOXP1 Priority Content

    The page must prominently feature:

    • Speech/language disorder role — FOXP1 is causative for childhood apraxia of speech and language delay, comparable to FOXP2 but historically overlooked
    • FOXP1 Syndrome — the neurodevelopmental syndrome (ID + speech delay + autistic features) needs dedicated section
    • Paralog relationship with FOXP2 — they heterodimerize; co-regulation of speech circuits
    • Every major claim inline-cited

    Suggested refs to add to refs_json (fetch from PubMed):
    • Sollis et al. 2023 (PMID 36349512) — FOXP1 mutations in NDD
    • Bacon & Bhatt 2022 — FOXP1 syndrome review (search "FOXP1 syndrome review 2022")
    • Lawton-Rauh et al. on FOXP1/FOXP2 heterodimerization
    • O'Roak et al. 2011 (PMID 21572417) — FOXP1 autism de novo mutations

    FOXP2 Priority Content

    • Lai et al. 2001 (PMID 11564484) — the original speech disorder mutation paper
    • Fisher & Scharff 2009 — FOXP2 evolution and language
    • Vernes et al. — FOXP2 targets CNTNAP2 (direct connection to autism)
    • All disease association claims cited

    Phase 3: Governance Tasks — Citation Coverage

    Recurring Task: wiki-citation-enrichment (every 6h)

    For each pass:

  • Query wiki pages with refs_json IS NOT NULL but content_md NOT LIKE '%[@%'
  • For each such page (up to 10 per pass):
  • a. Read the content and refs_json
    b. For each statement in the content that corresponds to a paper in refs_json, add [@key] marker
    c. Enrich refs_json entries to include claim and excerpt fields (use PubMed abstract)
    d. Update wiki page in DB
  • Track progress: log pages processed, citations added per pass
  • Recurring Task: paper-to-wiki-backlink (every 12h)

    For each paper in artifact_links that links to a wiki page:

  • Check if the wiki page's refs_json includes the paper
  • If not, add it to refs_json with minimal metadata (pmid, title, year)
  • Optionally: scan the wiki content for claims that match the paper's abstract, suggest [@key] placement
  • Recurring Task: wiki-quality-citation-report (daily)

    Produce a table:

    • Pages with 0 inline citations but refs_json populated
    • Pages with linked papers (artifact_links) but no refs_json
    • Citation coverage %: (pages with ≥1 inline citation) / (all pages)
    • Top 20 most-linked papers not yet cited inline

    Target: 80% of pages with linked papers should have at least one inline citation.

    Background Task: evidence-to-wiki-backfeed

    When a hypothesis or analysis gains new evidence (paper linked via hypothesis_papers):

  • Find wiki pages related to the hypothesis topic (via artifact_links + KG)
  • Check if the new paper is already cited on those wiki pages
  • If the evidence is strong (strength > 0.7), add to refs_json and flag for inline citation
  • Phase 4: "neurowiki" → "scidex" Rename

    All specs and code referencing "NeuroWiki" or "neurowiki" should be updated to "SciDEX Wiki" or "scidex". Key locations:

    • quest_wiki_spec.md — update vision statement
    • docs/planning/specs/6391cf49_neurowiki_import_spec.md — historical, note as legacy
    • docs/planning/specs/22c4c017_8b8_neurowiki_entity_lin_spec.md — update
    • docs/planning/specs/48c44e67_graph_neurowiki_url_fix_spec.md — update
    • docs/planning/specs/a7e9909f-de7_atlas_link_kg_to_neurowiki_spec.md — update
    • api.py: search for "NeuroWiki" in HTML/JS comments and CSS class names
    • wiki_pages table: neurowiki_commit and source_repo columns (keep for provenance, just relabel in UI)
    • The rendered wiki page footer: "Imported from NeuroWiki" → "SciDEX knowledge base"

    Implementation Order

    Week 1 (Sprint 1):
      [x] Phase 1a: Enrich refs_json schema design (done: this spec)
      [ ] Phase 1b: Rich hover popover in api.py — 2h implementation
      [ ] Phase 2: FOXP1 page rewrite with inline citations
      [ ] Phase 2: FOXP2 page rewrite with inline citations
    
    Week 2 (Sprint 2):
      [ ] Phase 3: wiki-citation-enrichment recurring task (agent)
      [ ] Phase 3: paper-to-wiki-backlink recurring task (agent)
      [ ] Phase 3: wiki-quality-citation-report daily task
      [ ] Phase 1c: wiki_citations table migration (if needed at scale)
    
    Week 3+:
      [ ] Phase 4: neurowiki → scidex rename pass
      [ ] Phase 3: evidence-to-wiki-backfeed background task
      [ ] Extend citation enrichment to all 9000+ wiki pages

    Success Criteria

    ☐ Rich hover popover shows claim + excerpt + figure_ref (not just authors/year)
    ☐ FOXP1 page: ≥8 inline citations, speech/language role prominently featured
    ☐ FOXP2 page: ≥8 inline citations, all major disease associations cited
    ☐ Citation coverage metric tracked daily (target: 80% of pages with linked papers)
    ☐ Recurring citation enrichment task runs every 6h, adding ≥5 citations per pass
    ☐ No more "NeuroWiki" references in active UI-facing code

    Work Log

    2026-04-09 — Initial Spec

    • Surveyed citation infrastructure: [@key] syntax exists, refs_json defined, JS renderer in place
    • FOXP1 has 5 refs in refs_json, 0 inline markers; FOXP2 has 5 refs (poor quality), 0 inline markers
    • Current tooltip shows only authors (year) — need full popover with claim/excerpt
    • Identified 4 recurring governance task patterns needed
    • Defined richer refs_json schema with claim, excerpt, figure_ref, strength

    File: quest_wiki_citations_spec.md
    Modified: 2026-04-25 22:00
    Size: 12.3 KB