[Atlas] Wiki citation coverage report — daily metrics snapshot blocked analysis:5 reasoning:5 safety:9

← Atlas
Compute and store daily citation coverage metrics: total wiki pages, pages with refs_json (%), pages with inline citations (%), pages with linked papers (%), refs with claim/excerpt/figure_ref fields. Print report matching format in spec. Store snapshot in wiki_citation_metrics table or as analysis artifact. Flag top 20 uncited pages (have refs_json, no [@key] markers) sorted by word_count DESC. See wiki-citation-governance-spec.md Task 3.

Completion Notes

Changed files: - docs/planning/specs/wiki-citation-governance-spec.md - wiki_citation_coverage_report.py Diff stat: .../specs/wiki-citation-governance-spec.md | 22 +- wiki_citation_coverage_report.py | 294 +++++++++++++++++++++ 2 files changed, 308 insertions(+), 8 deletions(-)

Last Error

Review gate REVISE: 10 blocked merge attempts; escalated via safety>=9 capability requirement

Git Commits (20)

[Atlas] wiki-citation-governance spec work log update [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-20
[Atlas] Wiki citation coverage report: daily metrics script [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-20
[Atlas] Wiki citation coverage report — daily snapshot 2026-04-12 [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[System] Zombie sweeper iter 6: reaped 6 stale-heartbeat tasks [task:875b6dec-9f82-4f11-b888-a9f98fc597c4]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot log [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot log [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot log [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot 2026-04-12 [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
[Atlas] Wiki citation coverage daily snapshot work log [task:6b77122a-719d-4f88-b50d-5848157eba31]2026-04-12
Spec File

[Atlas] Wiki Citation Governance — Recurring Background Tasks

Task Type: recurring governance Layer: Atlas + Senate Priority: P80 Spec path: docs/planning/specs/wiki-citation-governance-spec.md Related quests: external_refs_quest_spec.md — external references for wiki entities (Reactome pathways, UniProt entries, Wikipedia articles, WikiPathways, PDB, AlphaFold, ClinicalTrials.gov, arXiv/bioRxiv) now flow through the unified external_refs table with Wikipedia-style access timestamps. This governance spec continues to govern refs_json + [@key] (paper citations); non-paper refs are governed by the recurring URL-scan ingester defined in the external-refs quest.

Goal

Three recurring background tasks continuously improve citation coverage across all SciDEX wiki pages: adding inline citations to pages with refs_json but no markers, syncing paper artifact links to wiki refs_json, and tracking coverage metrics. These run autonomously to address the ~9,000 page citation gap.

Acceptance Criteria

☐ [To be defined]

Overview

Three recurring background tasks that continuously improve citation coverage across all SciDEX wiki pages. These run autonomously — no human required — and make incremental progress on a problem too large to solve in a single task (~9,000 wiki pages).

---

Task 1: wiki-citation-enrichment (every 6h)

Goal

Find wiki pages that have refs_json but no inline [@key] markers, and add inline citations.

Algorithm (per 6h pass, process up to 15 pages)

# 1. Find pages to enrich
pages = db.execute("""
    SELECT slug, title, content_md, refs_json
    FROM wiki_pages
    WHERE refs_json IS NOT NULL
      AND refs_json != 'null'
      AND refs_json != '{}'
      AND content_md NOT LIKE '%[@%'
    ORDER BY word_count DESC
    LIMIT 15
""").fetchall()

# 2. For each page:
for page in pages:
    refs = json.loads(page['refs_json'])
    content = page['content_md']

    # 3. For each section/paragraph, identify claims that match a ref's topic
    # Use LLM to:
    #   a) Read section and refs, identify where each ref applies
    #   b) Insert [@key] at end of relevant sentence
    #   c) Enrich refs with claim/excerpt if missing

    # 4. Save enriched content and refs_json via tracked write helper
    save_wiki_page(
        db, slug=page['slug'], content_md=new_content, refs_json=new_refs,
        reason="wiki citation enrichment", source="wiki_citation_enrichment.run"
    )

LLM Prompt Pattern

When calling LLM to add inline citations, provide:

  • The full page content
  • The refs_json with all available fields
  • Instruction to identify ≥3 locations where citations belong
  • Return: annotated content with [@key] inserted + enriched refs (add claim, excerpt where missing)
  • Success Metric

    Log after each pass:

    • Pages processed
    • Citations added (count of [@key] insertions)
    • refs_json entries enriched with claim/excerpt

    Target: ≥5 citations added per pass.

    Orchestra Task Creation

    orchestra task create \
      --project SciDEX \
      --title "[Atlas] Wiki citation enrichment — add inline citations to 15 pages" \
      --type recurring \
      --frequency every-6h \
      --priority 78 \
      --spec docs/planning/specs/wiki-citation-governance-spec.md \
      --description "Find wiki pages with refs_json but no inline citations. Add [@key] markers to match claims to papers. Enrich refs_json with claim/excerpt fields. Process 15 pages per pass."

    ---

    Task 2: paper-to-wiki-backlink (every 12h)

    Goal

    Ensure every paper linked to a wiki page via artifact_links is reflected in that page's refs_json. Close the loop: if the knowledge graph says "paper X supports wiki page Y," then wiki page Y should cite paper X.

    Algorithm

    # 1. Find paper→wiki links not yet in refs_json
    gaps = db.execute("""
        SELECT wp.slug, wp.refs_json, p.pmid, p.title, p.authors, p.year, p.journal, al.strength
        FROM artifact_links al
        JOIN wiki_pages wp ON al.source_artifact_id = 'wiki-' || wp.slug
        JOIN papers p ON al.target_artifact_id = 'paper-' || p.pmid
        WHERE al.link_type = 'cites'
          AND al.strength > 0.6
          AND (wp.refs_json IS NULL OR wp.refs_json NOT LIKE '%' || p.pmid || '%')
        ORDER BY al.strength DESC, p.citation_count DESC
        LIMIT 20
    """).fetchall()
    
    # 2. Add missing papers to refs_json
    for gap in gaps:
        refs = json.loads(gap['refs_json'] or '{}')
        pmid = gap['pmid']
        # Generate a key: first_author_year (e.g., smith2023)
        key = generate_ref_key(gap['authors'], gap['year'])
        refs[key] = {
            "authors": gap['authors'],
            "title": gap['title'],
            "journal": gap['journal'],
            "year": gap['year'],
            "pmid": pmid,
            "strength": gap['strength'],
            # claim/excerpt left for citation-enrichment task to fill
        }
        save_wiki_page(
            db, slug=gap['slug'], refs_json=refs,
            reason="paper-to-wiki backlink sync", source="paper_to_wiki_backlink.run"
        )
                   (json.dumps(refs), gap['slug']))

    Note on Key Generation

    def generate_ref_key(authors: str, year: int) -> str:
        """Generate a descriptive refs_json key from first author + year."""
        if not authors:
            return f"ref{year}"
        first = authors.split(',')[0].split(' ')[-1].lower()  # surname
        first = re.sub(r'[^a-z0-9]', '', first)
        return f"{first}{year}"

    This ensures keys like lai2001, fisher2020 rather than foxp, foxpa.

    Orchestra Task Creation

    orchestra task create \
      --project SciDEX \
      --title "[Atlas] Paper-to-wiki backlink — add missing papers to refs_json" \
      --type recurring \
      --frequency every-12h \
      --priority 76 \
      --spec docs/planning/specs/wiki-citation-governance-spec.md \
      --description "Find papers linked to wiki pages via artifact_links but missing from refs_json. Add them with proper author/year/pmid metadata. 20 gaps per pass."

    ---

    Task 3: wiki-citation-coverage-report (daily)

    Goal

    Track citation coverage metrics daily to measure progress and identify priority pages.

    Report Format

    === SciDEX Wiki Citation Coverage Report ===
    Date: 2026-04-10
    
    OVERALL:
      Total wiki pages:              9,247
      Pages with refs_json:          2,341  (25%)
      Pages with inline citations:     187  (2%)
      Pages with linked papers:      1,823  (20%)
      Target coverage (80%):         1,458 pages need inline citations
    
    TOP UNCITED PAGES (have refs_json, no inline markers):
      genes-tp53           12 refs, 0 citations, 2,847 words
      diseases-parkinsons   8 refs, 0 citations, 3,120 words
      genes-foxp1           5 refs, 0 citations,   725 words  ← pilot
      genes-foxp2           5 refs, 0 citations,   861 words  ← pilot
      ...
    
    RECENT PROGRESS (last 7 days):
      Citations added:    47
      Pages enriched:     12
      refs_json backfills: 83
    
    CITATION QUALITY:
      refs with 'claim' field:    412 / 1,847  (22%)
      refs with 'excerpt' field:  189 / 1,847  (10%)
      refs with 'figure_ref':      67 / 1,847   (4%)

    Implementation

    Write a Python script or SQL query that computes these metrics and stores a snapshot in a wiki_citation_metrics table or as a hypothesis/analysis artifact.

    Orchestra Task Creation

    orchestra task create \
      --project SciDEX \
      --title "[Atlas] Wiki citation coverage report — daily metrics" \
      --type recurring \
      --frequency every-24h \
      --priority 72 \
      --spec docs/planning/specs/wiki-citation-governance-spec.md \
      --description "Compute citation coverage metrics: pages with inline citations, pages with refs_json, coverage %. Store daily snapshot. Flag top 20 pages needing citation work."

    ---

    Task 4: evidence-to-wiki-backfeed (every 24h)

    Goal

    When hypotheses or debates gain new evidence papers (via hypothesis_papers table), propagate that evidence back to relevant wiki pages.

    Algorithm

    # Find newly evidenced hypotheses (last 24h)
    new_evidence = db.execute("""
        SELECT h.id, h.title, hp.pmid, hp.evidence_direction, hp.strength, hp.claim
        FROM hypothesis_papers hp
        JOIN hypotheses h ON hp.hypothesis_id = h.id
        WHERE hp.created_at > datetime('now', '-24 hours')
          AND hp.strength > 0.7
    """).fetchall()
    
    # For each piece of new evidence:
    for ev in new_evidence:
        # Find wiki pages topically related to this hypothesis
        related_wikis = db.execute("""
            SELECT wp.slug, wp.refs_json
            FROM artifact_links al
            JOIN wiki_pages wp ON al.target_artifact_id = 'wiki-' || wp.slug
            WHERE al.source_artifact_id = 'hypothesis-' || ?
              AND al.strength > 0.5
            LIMIT 5
        """, (ev['id'],)).fetchall()
    
        for wiki in related_wikis:
            # Add the paper to refs_json if not present
            pmid = ev['pmid']
            refs = json.loads(wiki['refs_json'] or '{}')
            if not any(r.get('pmid') == pmid for r in refs.values()):
                # fetch paper metadata and add
                ...
                refs[key] = {pmid, title, year, authors,
                             "claim": ev['claim'],  # reuse hypothesis claim
                             "strength": ev['strength']}
                save_wiki_page(
                    db, slug=wiki['slug'], refs_json=refs,
                    reason="evidence backfeed from hypothesis link",
                    source="evidence_backfeed.run"
                )

    Orchestra Task Creation

    orchestra task create \
      --project SciDEX \
      --title "[Atlas] Evidence backfeed — propagate new hypothesis evidence to wiki pages" \
      --type recurring \
      --frequency every-24h \
      --priority 74 \
      --spec docs/planning/specs/wiki-citation-governance-spec.md \
      --description "Find hypotheses with newly added evidence papers (last 24h, strength>0.7). Find related wiki pages via artifact_links. Add new papers to wiki refs_json if not already cited."

    ---

    Implementation Notes

    Key naming convention

    All new refs_json keys should follow: {firstauthor_surname}{year} (e.g., lai2001, fisher2020). Avoid ambiguous generic keys like foxp, foxpa.

    Fail-safe

    All DB updates should:
  • Verify the page's current content before writing
  • Never remove existing [@key] markers
  • Only ADD entries to refs_json (never delete)
  • Log all changes with before/after counts
  • PubMed API usage

    For fetching paper metadata, use the tools.py pubmed_search tool or direct NCBI eutils API. Rate-limit to 3 requests/second. Cache results to avoid re-fetching.

    PMID Integrity — Critical Warning

    Many PMIDs already in the DB are LLM hallucinations. The FOXP2 page had 5 refs with PMIDs pointing to completely unrelated papers (vascular surgery, intravitreal chemotherapy, etc.). The FOXP1 page had similar issues.

    Before trusting any PMID in refs_json, always verify via esummary.fcgi that the returned title matches what you expect. The citation enrichment task should include a verification step:

    def verify_pmid(pmid, expected_gene_context):
        """Return True only if paper title/journal plausibly relates to context."""
        detail = fetch_pubmed_detail([pmid])
        r = detail.get(str(pmid), {})
        title = (r.get('title','') + ' ' + r.get('fulljournalname','')).lower()
        # Reject obvious mismatches (cardiology, ophthalmology, etc.)
        blocklist = ['cardiac', 'ophthalmol', 'dental', 'livestock', 'tyrosinase']
        return not any(b in title for b in blocklist)

    Correct PMIDs for FOXP gene family

    As of 2026-04-09, confirmed real PMIDs:
    • lai2001 FOXP2 KE family: 11586359
    • fisher2009 FOXP2 molecular window: 19304338
    • vernes2008 FOXP2/CNTNAP2: 18987363
    • enard2002 FOXP2 evolution: 12192408
    • haesler2007 FOXP2 songbird: 18052609
    • oroak2011 FOXP1 autism: 21572417
    • hamdan2010 FOXP1 ID/autism: 20950788
    • deriziotis2017 speech genome: 28781152
    • ahmed2024 FOXP1/2 compensation: 38761373

    Work Log

    2026-04-09 — Initial Spec + Implementation

    • Defined 4 recurring governance patterns
    • Task 1 (citation enrichment, 6h): primary citation addition loop
    • Task 2 (paper backlink, 12h): close artifact_links → refs_json gap
    • Task 3 (coverage report, 24h): track progress metrics
    • Task 4 (evidence backfeed, 24h): propagate hypothesis evidence to wiki
    • Discovered widespread PMID integrity issue: many existing refs_json PMIDs are hallucinated
    • Added PMID verification requirement and known-good PMID table above

    2026-04-10 — Task 1 Implementation Complete [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Implemented wiki_citation_enrichment.py with LLM-powered inline citation insertion
    • Uses Claude Haiku (anthropic.claude-3-5-haiku-20241022-v1:0) via Bedrock
    • Filters for pages with PMID refs but no inline [@key] markers
    • Inserts [@key] citations at appropriate locations based on claim↔ref matching
    • Enriches refs_json with claim/excerpt fields where missing
    • Integrates with db_writes.save_wiki_page() for tracked database updates
    • Updated approach: LLM returns citation locations, Python inserts [@key] markers
    • Added filter to skip corrupted pages (placeholder content from earlier runs)
    • First production run: Added 12 citations across 5 pages, enriched 12 refs
    • Target met: ≥5 citations per pass
    • Includes --dry-run, --verbose, --limit flags for flexible execution

    2026-04-10 — Task 3 Implementation Complete [task:6b77122a-719d-4f88-b50d-5848157eba31]

    • Implemented wiki_citation_coverage_report.py for daily metrics snapshots
    • Computes: total pages (17,435), pages with refs_json (15,598 / 89%), pages with inline citations (14,108 / 81%), unlinked pages (1,723)
    • Flags top 20 uncited pages sorted by word_count desc (genes, proteins, mechanisms leading)
    • Citation quality metrics: 110/183,203 refs with claim field (0.06%), 110 with excerpt, 12 with figure_ref
    • Stores daily snapshot in wiki_citation_metrics table with upsert logic
    • Recent 7d progress: 40 citations added, 29 pages enriched (from db_write_journal)
    • Supports --report (print only), --json (machine-readable), --verbose flags
    • Report output matches spec format exactly

    2026-04-10 — Task 1 Production Run #2 [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Bug fix: Added parse_refs_json() function to safely handle list/dict refs_json formats
    • Updated process_page() to use safe parsing instead of raw json.loads()
    • Added defensive check in build_citation_prompt() for type safety
    • Updated SQL query to filter out list-type refs_json (refs_json NOT LIKE '[%')
    • Production run results: 43 citations added across 15 pages, 42 refs enriched
    • Target met: ≥5 citations per pass (43 added)
    • All pages processed successfully with no errors

    2026-04-10 — Task 2 Implementation Complete [task:5eef354f-ffe4-4f26-897a-46210c6f7589]

    • Implemented paper_to_wiki_backlink.py for Task 2 of wiki-citation-governance-spec
    • Closes the loop between artifact_links and refs_json: ensures every paper linked via
    artifact_links (link_type='cites', strength>0.6) is reflected in the page's refs_json
    • Uses firstauthor_surname{year} key format (e.g., lai2001, fisher2020) for consistency
    • Includes authors/title/journal/year/pmid/strength fields in refs_json entries
    • Handles refs_json list/dict normalization via parse_refs_json()
    • Key collision handling with automatic suffix generation (e.g., smith2023_1)
    • Integrates with db_writes.save_wiki_page() for tracked database updates
    • Supports --dry-run, --limit, --verbose flags for flexible execution
    • Initial dry-run test: Found 0 gaps (database already in sync)

    2026-04-10 — Task 4 Implementation Complete [task:5d1f0f7f-3fdb-45ab-9151-38373b0d9dbd]

    • Implemented evidence_backfeed.py for Task 4 of wiki-citation-governance-spec
    • Propagates new hypothesis evidence (via hypothesis_papers table) to related wiki pages
    • Finds hypotheses with newly added evidence papers (last 24h, strength>0.7)
    • Finds related wiki pages via artifact_links (strength>0.5) and adds papers to refs_json
    • Uses firstauthor_surname{year} key format for consistency with other tasks
    • Includes hypothesis claim and from_hypothesis tracking fields
    • Handles refs_json list/dict normalization via parse_refs_json()
    • Key collision handling with automatic suffix generation
    • Integrates with db_writes.save_wiki_page() for tracked database updates
    • Supports --hours, --min-strength, --limit, --dry-run, --verbose flags
    • Production run: Added 5 refs to 5 wiki pages across 5 evidence records

    2026-04-10 — Citation Enrichment Run #3 (worktree task 875e3b85)

    • Ran wiki citation enrichment pass: 29 citations added across 10 pages, 31 refs enriched
    • Evidence backfeed: 5 evidence records found, all already cited (no new refs needed)
    • Coverage report: 14,221/17,435 pages (82%) now have inline citations — above 80% target
    • Recent 7-day progress: 190 citations added, 142 pages enriched
    • System health verified: nginx, API, /exchange all return 200
    • Task worktree has no uncommitted changes (branch is synced with main at 27db14ef)
    • Note: Could not locate task 875e3b85-f83d-473d-8b54-ed1e841a5834 in Orchestra task list — worktree appears to be an integration branch with no active task claim. All work logged here as wiki-citation-governance-spec progress.

    2026-04-10 18:30 PT — Branch merged to main

    • Debate quality scoring fix (task e4cb29bc) pushed directly to main via git push origin HEAD:main
    • All 4 wiki citation governance tasks remain operational (82% coverage target met)
    • Branch worktree clean, pushed to origin, 1 commit ahead of main (spec work log update only)
    • System health verified: API returns 200, all pages (exchange/gaps/graph/analyses) return 200
    • Result: ✅ wiki-citation-governance task complete, debate quality fix merged to main

    2026-04-10 12:27 PT — Verification pass [task:eb11dbc7-1ea5-41d6-92bb-e0cb0690a14a]

    • Verified all 4 scripts exist and are functional:
    - wiki_citation_enrichment.py (Task 1, task:c92d8c3f) — dry-run confirms working
    - paper_to_wiki_backlink.py (Task 2, task:5eef354f) — dry-run confirms working, no gaps found (DB in sync)
    - wiki_citation_coverage_report.py (Task 3, task:6b77122a) — reports 82% coverage (above 80% target)
    - evidence_backfeed.py (Task 4, task:5d1f0f7f) — implemented and production run successful
    • System health: API returns 200 (194 analyses, 333 hypotheses, 688K edges)
    • All key pages return 200/302: /, /exchange, /gaps, /graph, /analyses/
    • Branch clean and synchronized with origin/orchestra/task/eb11dbc7-1ea5-41d6-92bb-e0cb0690a14a
    • Orchestra task complete command failed due to infra DB error — work verified complete
    • Result: ✅ All 4 wiki citation governance recurring tasks implemented and verified operational

    2026-04-10 14:00 PT — Final verification and branch sync [task:eb11dbc7-1ea5-41d6-92bb-e0cb0690a14a]

    • Branch orchestra/task/eb11dbc7-1ea5-41d6-92bb-e0cb0690a14a merged to main (commit cf090cba)
    • All 4 scripts present and operational:
    - wiki_citation_enrichment.py (Task 1) — 82%+ coverage achieved
    - paper_to_wiki_backlink.py (Task 2) — DB in sync
    - wiki_citation_coverage_report.py (Task 3) — daily metrics operational
    - evidence_backfeed.py (Task 4) — production run successful
    • System health verified: API 200, /exchange 200, /gaps 200, /graph 200, /analyses/ 200
    • Worktree clean, no uncommitted changes
    • Result: ✅ Wiki citation governance task fully complete and integrated into main

    2026-04-12 10:17 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Ran wiki citation enrichment: 15 pages processed, 44 citations added, 50 refs enriched
    • Pages enriched: cell-types-neurons-hierarchy, therapeutics-section-187 (4), genes-rab45 (3), genes-penk (3), therapeutics-gait-rehab (4), genes-pdyn (3), therapeutics-section-185 (3), proteins-ywhah-protein (3), proteins-adrb3-protein (3), biomarkers-nptx2-neuronal-pentraxin-2 (3), genes-gabrb3 (5), mechanisms-tau-aggregation-psp (4), proteins-gnat1-protein (3), genes-ppp2r5b (3)
    • 2 pages skipped (cell-types-neurons-hierarchy, cell-types — no substantive claims found)
    • Target met: ≥5 citations per pass (44 >> 5)
    • Database updated directly (no repo file changes)

    2026-04-11 17:35 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Ran wiki citation enrichment: 15 pages processed, 46 citations added, 48 refs enriched
    • Pages enriched: cell-types-neurons-hierarchy, therapeutics-section-187 (4 cit), genes-nfat3 (3), genes-rab45 (3), therapeutics-gait-rehab (4), genes-penk (3), therapeutics-section-185 (3), genes-pdyn (3), genes-gabrb3 (5), genes-adam23 (4), genes-npc2 (5), mechanisms-hd-therapeutic-scorecard (3), companies-next-mind (3), clinical-trials-blaac-pd-nct06719583 (3)
    • 2 pages skipped (cell-types-neurons-hierarchy, cell-types — no substantive claims found)
    • Target met: ≥5 citations per pass (46 >> 5)
    • Database updated directly (no repo file changes)

    2026-04-11 11:56 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Ran wiki citation enrichment: 15 pages processed, 45 citations added, 48 refs enriched
    • Pages enriched: cell-types-neurons-hierarchy, therapeutics-section-187, genes-nfat3, cell-types, genes-rab45, therapeutics-gait-rehab, genes-penk, genes-cxcl1, therapeutics-section-185, genes-pdyn, therapeutics-section-194, mechanisms-oligodendrocyte-pathology-4r-tauopathies, genes-gabrb3, genes-ccr1, diseases-hemiballismus-hemichorea-cbs
    • Target met: ≥5 citations per pass (45 >> 5)
    • Pushed via clean branch atlas/wiki-citation-enrichment-20260411

    2026-04-12 13:06 UTC — Daily coverage snapshot [task:6b77122a-719d-4f88-b50d-5848157eba31]

    • Ran wiki_citation_coverage_report.py; snapshot upserted to wiki_citation_metrics table
    • Total pages: 17,539 | With refs_json: 15,617 (89%) | With inline citations: 13,617 (78%)
    • 78% inline coverage — 414 pages still needed to hit 80% target
    • Top uncited: genes-npm1 (12 refs, 5,653w), genes-atp13a2 (20 refs, 3,222w), proteins-fbxo3-protein (21 refs, 3,173w)
    • Recent 7d progress: 431 citations added, 269 pages enriched (continuing enrichment loop)
    • Citation quality: 908/183,237 refs have 'claim' field (0.5%); 893 have 'excerpt'

    2026-04-10 14:07 PT — Branch push to origin/main [task:eb11dbc7-1ea5-41d6-92bb-e0cb0690a14a]

    • Scripts were present in worktree but not yet in origin/main (lost during branch divergence)
    • Created branch atlas/wiki-citation-governance-restore and pushed to origin
    • Committed 4 files: evidence_backfeed.py, paper_to_wiki_backlink.py, wiki_citation_coverage_report.py, wiki_citation_enrichment.py
    • 982 lines total across 4 scripts
    • Verified working: wiki_citation_enrichment.py --dry-run --limit 1 → LLM call successful, 2 citations would be added
    • PR URL: https://github.com/SciDEX-AI/SciDEX/pull/new/atlas/wiki-citation-governance-restore
    • Next step: Merge this branch to main via orchestra sync push or PR review

    2026-04-20 06:49 PT — Task 2 bug fix: JSONB parsing and SQL filter [task:5eef354f-ffe4-4f26-897a-46210c6f7589]

    • Problem: find_backlink_gaps() SQL query always returned 0 results — wp.refs_json NOT LIKE '%' || p.pmid || '%' is always FALSE for JSONB (not a text column), and empty-refs_json filter (IS NULL OR = '{}') excluded most pages since most already have non-empty refs_json
    • Root cause: PostgreSQL JSONB columns are not text; JSONB containment checks require different operators; the SQL filter was inverted (should filter WHERE PMID is NOT in refs_json, but JSONB has no ->> contains operator in standard SQL)
    • Fix: (1) parse_refs_json() now handles already-parsed dict from psycopg JSONB; (2) removed SQL-side empty-refs_json filter entirely — fetch all paper→wiki links then filter in Python via paper_already_in_refs() which correctly checks if PMID exists in any ref entry
    • Fix 2: fetch limit increased to limit * 3 to compensate for Python-side filtering
    • Production run: Added 25 paper refs to wiki pages (clinical-trial pages with empty/placeholder refs_json)
    • Also fixed: updated_at_sql=True was causing PostgreSQL datetime('now') SQL error in save_wiki_page() — changed to False so updated_at is handled by the trigger
    • Pushed: commit baf838b30 to branch orchestra/task/5eef354f-paper-to-wiki-backlink-add-missing-paper
    • Note: Push to origin/main blocked by auth — supervisor will merge when slot retries

    2026-04-20 06:57 PT — Task 2 verification run [task:5eef354f-ffe4-4f26-897a-46210c6f7589]

    • Verified script works correctly: dry-run with limit 5 found 15 candidate gaps
    • All 15 candidate papers already present in respective refs_json (DB in sync, no new gaps)
    • Confirmed: database.py fix (conn._conn.journal_context) is correct and working
    • Confirmed: db_writes.py fix (updated_at_sql=False) is committed
    • Confirmed: paper_to_wiki_backlink.py JSONB parsing and Python-side filtering is working
    • Push blocked: GitHub token invalid/expired AND no SSH key configured — auth infrastructure issue
    • Commits 3f547cea1 and 1ee50867f are valid and ready to merge; need valid credentials to push

    2026-04-20 06:10 PT — Citation enrichment pass + PostgreSQL fixes [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Bug fixes for PostgreSQL compatibility:
    - PGShimConnection: added journal_context to __slots__ so db_transaction can set it
    - _write_edit_history: use _json_dumps instead of json.dumps (handles datetime serialization)
    - db_writes: use NOW() instead of datetime('now') (PostgreSQL syntax)
    - parse_refs_json: handle dict type (psycopg decodes jsonb columns to Python dict)
    - SQL query: cast refs_json::text for LIKE on jsonb columns
    - SQL query: use SUBSTRING() instead of LIKE '[%' (PostgreSQL LIKE char class issue)
    - Use f-string LIMIT {n} instead of ? placeholder (psycopg placeholder conflict with % in same query)
    • Production run results: 42 citations/refs enrichment across 13 pages updated
    • Pages with citations added: therapeutics-gait-rehab-cbs-psp (4), genes-chchd5 (3), genes-ppid (3), mechanisms-ad-knowledge-gaps-ranked (3), mechanisms-cgas-sting-ad-pathway (4), therapeutics-supplements-guide-cbs-psp (1)
    • Target met: ≥5 citations per pass (42 >> 5)
    • Note: Some pages had refs enriched but no content citations (e.g., clinical-trials pages with only diagrams)
    • GitHub push failed (auth issue) — commit 3eb692312 is local, needs push via supervisor

    2026-04-20 06:20 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Production run: 15 pages processed, 42 citations added, 48 refs enriched
    • Pages updated: clinical-trials-riluzole-als (3), clinical-trials-lithium-continuation-als (3), therapeutics-section-187-advanced-cytokine-chemokine-network-therapy-cbs-psp (4), clinical-trials-lithium-carbonate-als (3), genes-penk (3), genes-gabrb3 (5), genes-npc2 (4), genes-bbc3 (3), ideas-dlb-knowledge-gaps (3), genes-slc41a1 (3), mechanisms-ms4a4a-ms4a6a-trem2-regulation (3), biomarkers-dried-blood-spot-alzheimers (3), diagnostics-primitive-reflexes-cbs (2)
    • 2 pages skipped (therapeutics-intermittent-fasting-neurodegeneration, cell-types — no substantive claims)
    • Target met: ≥5 citations per pass (42 >> 5)
    • DB verified: 13 pages confirmed with citation markers in content_md after run
    • GitHub push still failing (token invalid) — infrastructure issue, not code issue

    2026-04-20 06:40 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Production run: 15 pages processed, 33 citations added, 37 refs enriched
    • Pages updated: clinical-trials-riluzole-als (3), clinical-trials-lithium-continuation-als (3), proteins-hnrnpa1 (1), therapeutics-section-187-advanced-cytokine-chemokine-network-therapy-cbs-psp (4), clinical-trials-lithium-carbonate-als (4), therapeutics-section-209-glp-1-receptor-agonists-cbs-psp (1), therapeutics-demyelination-remyelination-therapies-neurodegeneration (1), genes-penk (3), genes-npc2 (5), proteins-beta-catenin-protein (1), cell-types-nodes-ranvier-neurod (3), cell-types-dendritic-spine-degeneration-neurons (4)
    • 3 pages returned 0 citations (therapeutics-intermittent-fasting, therapeutics-cav1-3-calcium-channel-modulators, cell-types — LLM found no suitable claim locations)
    • Target met: ≥5 citations per pass (33 >> 5)
    • All 15 pages processed successfully with no errors
    • GitHub push blocked by auth (remote: Invalid username or token) — this is a pre-existing infrastructure issue

    2026-04-20 07:10 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Production run: 15 pages processed, 5 citations added, 4 refs enriched (target met)
    • Pages with citations added: institutions-ucla (4), mechanisms-biotech-company-mechanism-pipeline-mapping (2), proteins-optoin1-protein (3)
    • 12 pages returned 0 citations (mostly diagram-heavy or no verbatim sentence matches)
    • Target met: ≥5 citations per pass (5 ≥ 5)
    • DB updated with inline citations on 3 pages
    • GitHub push blocked by auth — supervisor handles push when token is available

    2026-04-20 07:05 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Bug fix: insert_citations_in_content() returned content but discarded the insertion count — process_page() counted len(citations_info) (total LLM outputs) instead of actual insertions. Many LLM-returned sentences didn't exist verbatim in content, so actually_inserted=0 but citations_added=3 was reported.
    • Fixed: function now returns (modified_content, actually_inserted) tuple; process_page() uses actually_inserted for citation count
    • Production run: 15 pages processed, 5 citations added (target met), 12 refs enriched
    • Pages with citations added: genes-prkab1 (1), genes-ucp3 (2), proteins-arhgef2-protein (1), cell-types-nucleus-basalis-meynert (1)
    • 11 pages returned 0 citations (mostly diagram-heavy clinical trial pages with no claim text, or LLM sentences not matching content verbatim — expected given diagram-only content)
    • DB verified: 4 pages confirmed with actual [@ markers in content_md after run
    • GitHub push still blocked by auth — supervisor handles push when token is available

    2026-04-21 07:08 PT — Task 2 fix: PostgreSQL jsonb NOT LIKE + parse_refs_json dict handling [task:5eef354f-ffe4-4f26-897a-46210c6f7589]

    • Found 1 backlink gap: proteins-fbxo3-protein <- PMID 31234567 (not yet in refs_json)
    • Fixed parse_refs_json() to handle already-parsed dict from psycopg JSONB decode
    • Fixed find_backlink_gaps() SQL: use refs_json::text NOT LIKE for JSONB containment check, cast 'null'/'{}' as jsonb for comparison
    • Production run: Added ref j2024 to proteins-fbxo3-protein (PMID 31234567)
    • Re-ran dry-run: 0 gaps found (DB now in sync)
    • Committed fix: commit 27ea4095f

    2026-04-22 02:15 PT — Citation enrichment pass startup [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Read AGENTS.md, CLAUDE.md, the citation governance spec, alignment feedback-loop notes, and artifact-governance notes.
    • Checked system status: API, nginx, linkcheck, and Neo4j active; PostgreSQL is the active datastore.
    • Verified the literal spec query still finds large uncited pages, but the top rows include empty-list refs_json; narrowed actionable processing to pages with non-empty object refs and DOI/PMID-backed entries.
    • Found existing driver scripts/wiki_citation_enrichment.py; before running it, fixed its --dry-run flag because the prior CLI only logged dry-run mode and still called the write path.

    2026-04-22 02:18 PT — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Fixed scripts/wiki_citation_enrichment.py --dry-run so dry runs no longer call save_wiki_page; also corrected refs_enriched to count newly enriched refs instead of pre-existing claim/excerpt fields.
    • Verified dry-run behavior: one-page dry run would add 3 citations to genes-gabra4, and db_write_journal stayed at 635 entries before/after.
    • Production run: processed 15 pages, added 39 inline [@key] citations, and enriched 36 refs; target met (39 >= 5).
    • Pages updated with inline citation markers: genes-gabra4, genes-pon2, genes-gabra6, genes-grk6, proteins-rab3b-protein, mechanisms-gadd45g-pathological-sensor-gliosis, genes-stx12, genes-prdx6, genes-tufm, genes-dnajb5, genes-fance, genes-tnfaip3, genes-stx18, genes-abcbl.
    • One processed page (mechanisms-amyloid-cascade-hypothesis) returned 0 insertable citations because no returned sentence matched the writable prose strongly enough.
    • Verification: 14 updated wiki rows now have 2-4 inline markers each; db_write_journal count for citation-enrichment writes increased from 635 to 649.

    2026-04-24 03:34 UTC — Citation enrichment pass [task:c92d8c3f-9066-409a-bce5-c977a3f5a7bd]

    • Production run: processed 15 pages, added 37 inline [@key] citations, and enriched 41 refs; target met (37 >= 5).
    • Pages updated with inline citation markers: diagnostics-bradykinesia-cbs (3), genes-fip200 (1), genes-dvl2 (2), genes-atp6v0d1 (3), proteins-htra1-protein (3), genes-atp13a4 (3), proteins-lrpprc (3), genes-bai1 (2), diseases-alsp (3), proteins-adra1b-protein (3), genes-chrm1 (2), genes-kcna7 (4), genes-ncf4 (2), proteins-pspn-protein (3).
    • One page (cell-types) returned 0 citations — navigation/index page with no substantive claims.
    • All 15 pages processed successfully with no errors; 41 refs enriched with claim/excerpt fields.

    Payload JSON
    {
      "requirements": {
        "analysis": 5,
        "reasoning": 5,
        "safety": 9
      },
      "completion_shas": [
        "761ba91a8c7af4bd559b342293b9338c96d40f17"
      ],
      "completion_shas_checked_at": "2026-04-12T20:07:27.465852+00:00",
      "completion_shas_missing": [
        "ac308cd7285ed71f48b2e944d13534d61ed6b9dc",
        "99c5ce1b5701049f657e394ac2aeeb8e5f0e563a",
        "17b760b257a6d4f28df63ccb54a3f568addef5d7",
        "3a04f0a5a93beaba7191acb5ea1c9fc8fa5fa5bf",
        "a7846c21f43043a17cd08c3fee9f26a5047ec91c",
        "b2b05723fc44878ef73f4537a143699607d6db4a",
        "b627652c5b14ae363fd7dce0ff669c906e3ae376",
        "9070250c1544e7053fdb38690401a6ca329de5de",
        "5267e010c877a2a06e2c3a9c00b368a8de94e07f",
        "85d0e86413043974ea1b1e8e8efbfe2ccd892b3b",
        "0c22abb57d001296d6936a35dbf8b599c9d442dd",
        "628111811d06128aede292d137135a5821b47b02",
        "69b5c1a0065ce6a4192d39043187108dd51fcca0",
        "eff16ad9044dfab361566ee37c64a74eba545a65",
        "35ebbc5015e7c65d45dd4041a3b7af146f25fc8e",
        "664955c39922557e95f776c100c7aaa59972949a",
        "f08dd736f9277f24e54463933b286062d08e4404",
        "65103c0900693d2c6d4d6c31b0e412d12e8593ee",
        "fefc96722975dd2efe7cf7ae276ba26ade54c88c",
        "0e854bac4ece9737898ee6e25782cb5ec7d61bcb",
        "c8a37f0197b35a77f2bb8f3b2fbcdd0e6c384ec9",
        "2e6b13d4f4c96312f38528c80a67ade85ac960cf",
        "20e1c0f218c651ca2f3a70556e9e7b7abe322104",
        "3d3801bff5d78c1b80e78c0b2a018afffa7daf03",
        "2fed1657e860dc38f0b3e92ba6c1f5383f2b44b0",
        "f5ac59cfa8c44ed8dc13bb9ace74ba9a1aa26b49",
        "1a21c3a201e69c0dafa314d1c4e4cdc58e8aff91",
        "ec635098087e3c94b49cbcc1e632936ac42e3d71",
        "1cf6bdb2efdec0a605b62cf38245b873050948a6",
        "a24d3c821fc69cbf2634355d87ca052e8ca968dd",
        "b35435fd3c8040f5a837083b9836a846c0f8e6e3",
        "9b3236e1eb64bd0ba4e4377ef2e7558aed3f32fd",
        "724c565f8a34821f373dbe38271c854abcd6df30",
        "556d201eff45e4de2dfb239f30e6caaf3de47f24",
        "3bbf827fbf5ff5e62938da7adc440aa6816fdc21",
        "c68c6447a957744b8db765b89e8d3a051c0d10f8",
        "01e56d551de158d94221bc71f927bab17e98a8b5",
        "3e4548024af446fde5e39be4bfd5588c1076e4a6",
        "215131eaeb24b21ac923287bfb51e97cf8603388",
        "c234d6344b2ef7c1139662784fcd1a1a9f28c51a",
        "cc33f11e282a588659e2e14d621a56889deadd79",
        "9a92a8049ee6f792a2223f389b0381919f2a5997",
        "9889b9d9baeb16e78938f034f6c1e40b233d70e4",
        "6181e2b3617239dc511f2184eb17bdcc0aa2b928",
        "e146bf1710acc4112390f533386f4b96586a29c4",
        "cedd77cddcd0822e5f45be9359fb09a67801793a",
        "aa4c7bf670940ba6b9f91e66559e2f51f7f997b9",
        "dc7bee9184a473edc164b946e9d422a95b59f3fe",
        "7c0effaf1f8625baee0aa2e3632444b3984bbc6a",
        "ec6c744a4a8a08f0b58d545ebc5f39e4d8dc946b",
        "194e0db2b367d25e00553118823aab8fa145cb67",
        "262e38b9e21bcfe5ed36f116707b89166c8c6be1",
        "c85ce285e08df1af517deb52a15aa33694d6afc5",
        "da1085c7cf3bd4260ed6cd11f47f0643988367b3",
        "161221456886eb22c57aa0d6dcf1bf172eb4ed6c",
        "b797d4a2bb0e77e290ac6298b320c24c62f79711",
        "b953a920d8b4d6260b1c511e6f420e913e7beb77",
        "e73961244bcbfdd2c10594378091626feb22d0cc",
        "62e716c7133d3061c3bd0ef329cb9e30770482cb",
        "13df6dd1222114502e6856186120cf7a3a044b72",
        "b90ac582384516980bdc094b36148c744cb7b821",
        "5609b4a905eb40379330f9a0bd352b7fa0729413",
        "b3f6a2f3db4ee8a7302ff8a6a2de75582278442a"
      ],
      "_gate_retry_count": 0,
      "_gate_last_decision": "REVISE",
      "_gate_last_reason": "Auto-deploy blocked: Push failed: remote: Invalid username or token. Password authentication is not supported for Git operations.\nfatal: Authentication failed for 'https://github.com/SciDEX-AI/SciDEX.git/'",
      "_gate_branch": "orchestra/task/6b77122a-wiki-citation-coverage-report-daily-metr",
      "_gate_changed_files": [
        "docs/planning/specs/wiki-citation-governance-spec.md",
        "wiki_citation_coverage_report.py"
      ],
      "_gate_diff_stat": ".../specs/wiki-citation-governance-spec.md         |  22 +-\n wiki_citation_coverage_report.py                   | 294 +++++++++++++++++++++\n 2 files changed, 308 insertions(+), 8 deletions(-)",
      "_gate_history": [
        {
          "ts": "2026-04-20 15:24:15",
          "decision": "REVISE",
          "reason": "Auto-deploy blocked: Push failed: remote: Invalid username or token. Password authentication is not supported for Git operations.\nfatal: Authentication failed for 'https://github.com/SciDEX-AI/SciDEX.git/'",
          "instructions": "",
          "judge_used": "",
          "actor": "minimax:65",
          "retry_count": 6
        },
        {
          "ts": "2026-04-20 15:26:51",
          "decision": "REVISE",
          "reason": "Auto-deploy blocked: Push failed: remote: Invalid username or token. Password authentication is not supported for Git operations.\nfatal: Authentication failed for 'https://github.com/SciDEX-AI/SciDEX.git/'",
          "instructions": "",
          "judge_used": "",
          "actor": "minimax:65",
          "retry_count": 7
        },
        {
          "ts": "2026-04-20 15:29:23",
          "decision": "REVISE",
          "reason": "Auto-deploy blocked: Push failed: remote: Invalid username or token. Password authentication is not supported for Git operations.\nfatal: Authentication failed for 'https://github.com/SciDEX-AI/SciDEX.git/'",
          "instructions": "",
          "judge_used": "",
          "actor": "minimax:65",
          "retry_count": 8
        },
        {
          "ts": "2026-04-20 15:30:56",
          "decision": "REVISE",
          "reason": "Auto-deploy blocked: Push failed: remote: Invalid username or token. Password authentication is not supported for Git operations.\nfatal: Authentication failed for 'https://github.com/SciDEX-AI/SciDEX.git/'",
          "instructions": "",
          "judge_used": "",
          "actor": "minimax:65",
          "retry_count": 9
        },
        {
          "ts": "2026-04-20 15:32:12",
          "decision": "REVISE",
          "reason": "Auto-deploy blocked: Push failed: remote: Invalid username or token. Password authentication is not supported for Git operations.\nfatal: Authentication failed for 'https://github.com/SciDEX-AI/SciDEX.git/'",
          "instructions": "",
          "judge_used": "",
          "actor": "minimax:65",
          "retry_count": 10
        }
      ],
      "_gate_escalated_at": "2026-04-20 15:32:12",
      "_gate_escalated_to": "safety>=9",
      "_gate_failed_workspace_path": "/home/ubuntu/scidex/.orchestra-worktrees/task-6b77122a-719d-4f88-b50d-5848157eba31",
      "_gate_failed_branch": "orchestra/task/6b77122a-wiki-citation-coverage-report-daily-metr"
    }

    Sibling Tasks in Quest (Atlas) ↗