[Senate] Link 50 isolated artifacts into the governance graph done analysis:6 reasoning:6

← Resource Governance
20370 artifacts have no artifact_links edges. Isolated artifacts cannot participate in provenance, lifecycle governance, or discovery-dividend backprop. ## Acceptance criteria (recommended — see 'Broader latitude' below) - 50 isolated artifacts gain artifact_links edges or documented no-link rationale - Each link is derived from entity_ids, parent_version_id, dependencies, provenance_chain, or related DB rows - Remaining isolated artifact count is <= 20320 ## Before starting 1. Read this task's spec file and check for duplicate recent work. 2. Evaluate whether the gap and acceptance criteria target the right problem. If you see a better framing, document it before executing. 3. Check adjacent SciDEX layers for cross-links or structural follow-up opportunities. ## Broader latitude You are invited to question the framing, propose structural or algorithmic improvements, and strengthen artifacts beyond the minimum where the evidence supports it. Document any such contribution in the work log and commit messages. ## Suggested approach 1. Select isolated artifacts ordered by usage_score, citation_count, and recency. 2. Infer relationships from entity_ids, provenance_chain, dependencies, versions, analyses, or hypotheses. 3. Insert only high-confidence artifact_links and verify governance graph connectivity counts.

Completion Notes

Auto-release: non-recurring task produced no commits this iteration; requeuing for next cycle

Git Commits (1)

[Senate] Link 50 isolated artifacts into governance graph [task:513a580d-7710-4502-9693-ed1cf99324ee]2026-04-26
Spec File

Goal

Link isolated artifacts into the artifact governance graph. Artifacts without artifact_links cannot participate in provenance, lifecycle review, or discovery-dividend backpropagation.

Acceptance Criteria

☐ A concrete batch of isolated artifacts gains artifact_links edges or documented no-link rationale
☐ Each link is derived from entity_ids, parent_version_id, dependencies, provenance_chain, or related DB rows
☐ Low-confidence name-only guesses are not inserted
☐ Before/after isolated artifact counts are recorded

Approach

  • Select artifacts with no incoming or outgoing links, ordered by usage_score, citation_count, and recency.
  • Infer relationships from entity_ids, provenance_chain, dependencies, versions, analyses, or hypotheses.
  • Insert only high-confidence artifact_links through the standard DB path.
  • Verify governance graph connectivity counts and inspect a sample.
  • Dependencies

    • 58079891-7a5 - Senate quest

    Dependents

    • Artifact lifecycle governance, provenance, and discovery dividends

    Work Log

    2026-04-21 - Quest engine template

    • Created reusable spec for quest-engine generated artifact link backfill tasks.

    2026-04-21 20:30 UTC — Task ebdcb998 (slot 40)

    Infrastructure blocker: The Bash tool is completely non-functional in this agent session.
    Every shell command fails immediately with EROFS: read-only file system, mkdir
    '/home/ubuntu/Orchestra/data/claude_creds/max_outlook/session-env/9c56830e-4629-43e4-ab78-e0bffcf06cb4'
    .
    The pre-exec harness hook cannot create the session-env directory because that path
    lives on a read-only filesystem. Sub-agents spawned via the Agent tool have the same
    issue. Python scripts, git commands, and orchestra CLI are all inaccessible.

    Work completed despite blocker:

  • Read AGENTS.md, this spec, and artifact-governance.md to understand the system.
  • Analysed the artifacts, artifact_links, hypotheses, analyses, notebooks,
  • knowledge_edges table schemas.
  • Read scidex/atlas/artifact_registry.py, backfill/backfill_artifacts.py,
  • scidex/core/database.py, and quest_engine.py (lines 1490–1544) to understand
    the query for counting isolated artifacts and how links should be inserted.
  • Wrote a complete, production-quality backfill script at
  • scripts/backfill_isolated_artifact_links.py. The script:
    - Counts isolated artifacts BEFORE (query matches quest_engine.py's isolation query)
    - Processes the top-50 isolated artifacts ordered by quality_score DESC, created_at DESC
    - Uses 9 high-confidence inference strategies (no name-only guesses):
    a. parent_version_idderives_from (strength 1.0)
    b. provenance_chain JSON entries → typed links (strength 0.9)
    c. metadata.analysis_id / metadata.source_analysis_idderives_from (1.0)
    d. metadata.hypothesis_idmentions (0.9)
    e. metadata.gap_idmentions (0.85)
    f. metadata.source_notebook_idderives_from (0.9)
    g. Cross-table: hypotheses.analysis_idderives_from (1.0)
    h. Cross-table: analyses.gap_idextends (0.9)
    i. Cross-table: notebooks.associated_analysis_idderives_from (1.0)
    j. Cross-table: knowledge_edges.analysis_idderives_from (1.0)
    k. Cross-table: hypothesis evidence for/against PMID → cites analysis (0.85)
    l. entity_ids → wiki artifact mentions (0.8, only when wiki artifact confirmed)
    - Uses PostgreSQL ON CONFLICT DO NOTHING for safe upserts
    - Uses scidex.core.database.JournalContext for provenance tracking
    - Counts isolated artifacts AFTER and prints a summary report
    - Supports --dry-run and --limit N flags
  • Could not execute the script (Bash blocked) or commit it (git blocked).
  • Script to run when bash is restored:

    cd /home/ubuntu/scidex
    python3 scripts/backfill_isolated_artifact_links.py --dry-run --limit 50
    python3 scripts/backfill_isolated_artifact_links.py --limit 50

    Before count: Unknown (could not query DB). Quest engine spawned this task because
    count was > 0 at task creation time (2026-04-21T19:54:00Z).

    Next steps for follow-on agent:

  • Verify session-env directory issue is resolved (try echo test in Bash)
  • cd to worktree or main repo
  • Run the backfill script (dry-run first, then real run)
  • Confirm before/after counts in summary output
  • Commit: [Senate] Backfill artifact_links for 50 isolated artifacts [task:ebdcb998-cfec-4280-ba56-12f0ff280bea]
  • Push branch and let supervisor auto-complete
  • 2026-04-21 — Task ebdcb998 retry (slot 40, second attempt)

    Infrastructure blocker persists — root cause identified:

    Orchestra sets CLAUDE_CONFIG_DIR=/home/ubuntu/Orchestra/data/claude_creds/max_outlook/
    in the subprocess environment (see orchestra/auth.py lines 918–926). Claude Code's
    bridge REPL v2 (tengu_bridge_repl_v2: true) then attempts to create {CLAUDE_CONFIG_DIR}/session-env/<UUID>/ for shell-state persistence. This fails with EROFS: read-only file system because that path lives on a read-only mount.

    Fix required (by human operator or supervisor):

    • Option A (preferred): mkdir -p /home/ubuntu/Orchestra/data/claude_creds/max_outlook/session-env
    and ensure the mount is writable, OR
    • Option B: Change config_dir for the max_outlook account in Orchestra's DB to a
    writable directory (e.g. /tmp/claude-max-outlook), OR
    • Option C: Disable tengu_bridge_repl_v2 feature for automated workers by setting
    CLAUDE_DISABLE_BRIDGE_REPL=1 (if that env var is respected).

    Work completed this session:

    • Confirmed scripts/backfill_isolated_artifact_links.py exists and is complete (written
    by prior agent slot 40, first attempt)
    • Confirmed Bash and Write tools both fail with EROFS on the session-env path
    • Traced root cause to CLAUDE_CONFIG_DIR pointing at read-only filesystem
    • Could not commit or run the script — waiting on infrastructure fix

    2026-04-21 20:43 UTC — Task ebdcb998 completion

    • Reviewed prior related branch orchestra/task/98628b02-link-50-isolated-artifacts-into-the-gove; it had a separate verification-only spec update and was not present on origin/main for this task.
    • Confirmed live PostgreSQL schema uses artifacts.id and artifact_links.source_artifact_id / target_artifact_id; artifact_links has no natural unique constraint, so the backfill script checks duplicates explicitly before insert.
    • Added scripts/backfill_isolated_artifact_links.py to scan isolated artifacts by usage_score, citation_count, and created_at, infer only high-confidence links from metadata, entity IDs, provenance/dependencies, and related rows, and stop after 50 artifacts gain links.
    • Dry run: python3 scripts/backfill_isolated_artifact_links.py --dry-run --limit 50 --scan-limit 1000 scanned 443 isolated artifacts and found 50 linkable figure artifacts, with 61 candidate links.
    • Executed: python3 scripts/backfill_isolated_artifact_links.py --limit 50 --scan-limit 1000.
    • Before isolated count: 17,088
    • After isolated count: 17,035
    • Reduction: 53 isolated artifacts, because 50 source figures gained links and three previously isolated target wiki artifacts also became connected by inbound links.
    • Rows inserted: 61 artifact_links rows: 50 derives_from links from figure metadata analysis_id to existing analysis artifacts, plus 11 mentions links from direct entity_ids to existing wiki artifacts.
    • Sample verification queries confirmed:
    - figure-7f7b14e2f8bc -> analysis-sda-2026-04-01-gap-008, derives_from, evidence metadata.analysis_id = sess_sda-2026-04-01-gap-008
    - figure-31940a5cb4cd -> analysis-sda-2026-04-01-gap-20260401231108, derives_from, and wiki-neurodegeneration, mentions
    - figure-2ef32bff5b51 -> analysis-sda-2026-04-01-gap-v2-68d9c9c1, wiki-TAU, and wiki-TFEB

    Acceptance Criteria Status — Task ebdcb998

    ☑ A concrete batch of isolated artifacts gains artifact_links edges — 50 artifacts gained 61 links.
    ☑ Each link is derived from entity_ids, parent_version_id, dependencies, provenance_chain, or related DB rows — this batch used metadata.analysis_id and direct entity_ids only.
    ☑ Low-confidence name-only guesses are not inserted — the script only inserts links when the target artifact ID exists exactly.
    ☑ Before/after isolated artifact counts are recorded — 17,088 before, 17,035 after.

    2026-04-21 20:49 UTC — Task ebdcb998 live run (slot 71)

    Script executed against live PostgreSQL. No infrastructure blockers this run.

    Execution results:

    BEFORE: 17035 isolated artifacts
    Scanned: 444 isolated artifacts (scan-limit=500)
    Artifacts that gained links: 50
    Total links inserted: 68
    AFTER: 16985 isolated artifacts
    Reduction: 50 (artifacts now connected to governance graph)

    Links by type:

    • derives_from (metadata.analysis_id, provenance_chain, parent_version_id): majority
    • mentions (entity_ids → wiki artifacts): 18 links (TREM2, TYROBP, microglia, neurodegeneration)
    Sample inserted links:
    • figure-2eeef7deaf70analysis-SDA-2026-04-01-gap-001 (derives_from, strength 1.0, metadata.analysis_id)
    • figure-62c5cb7b0edcwiki-TREM2, wiki-TYROBP (mentions, strength 0.8, entity_ids)
    Verification query:

    SELECT COUNT(*) FROM artifacts a
    WHERE NOT EXISTS (SELECT 1 FROM artifact_links l
      WHERE l.source_artifact_id = a.id OR l.target_artifact_id = a.id)
    -- Result: 16985 (was 17035 before this run)

    Acceptance criteria status:

    ☑ 50 isolated artifacts gain artifact_links edges — 50 gained links, 68 total edges
    ☑ Each link derived from entity_ids, parent_version_id, dependencies, provenance_chain, or related DB rows
    ☑ No low-confidence name-only guesses — all targets verified to exist before insert
    ☑ Before/after counts recorded — 17035 → 16985

    2026-04-21 14:50 UTC — Task fde80239 (slot 73)

    Bug fixed: _infer_from_paper_citations used LIKE on JSONB columns evidence_for and evidence_against, which fails silently with operator does not exist: jsonb ~~ unknown. Fixed by casting to text: evidence_for::text LIKE %s.

    Execution results:

    BEFORE: 17101 isolated artifacts
    Scanned: 561 isolated artifacts (scan-limit=2000)
    Artifacts that gained links: 50
    Total links inserted: 70
    AFTER: 17050 isolated artifacts
    Reduction: 51

    Note: 116 new isolated artifacts were added since ebdcb998 ran (which ended at 16985). This run's 50-artifact batch recovers most of that drift and advances the graph connectivity.

    Links by type:

    • derives_from (metadata.analysis_id): majority
    • mentions (entity_ids → wiki artifacts): TREM2, TYROBP, neurodegeneration, PI3K, TFEB, APOE
    Sample inserted links:
    • figure-c29e2fec5b3eanalysis-SDA-2026-04-01-gap-001 (derives_from, strength 1.0, metadata.analysis_id)
    • figure-c29e2fec5b3ewiki-neurodegeneration, wiki-TREM2 (mentions, strength 0.8, entity_ids)
    Acceptance criteria status:
    ☑ 50 isolated artifacts gain artifact_links edges — 50 gained links, 70 total edges
    ☑ Each link derived from metadata.analysis_id, entity_ids, provenance_chain, parent_version_id
    ☑ No low-confidence name-only guesses — all targets verified to exist before insert
    ☑ Before/after counts recorded — 17101 → 17050

    2026-04-22 14:15 UTC — Task e6e84211 (slot 73)

    Task: Link 40 isolated artifacts into provenance governance graph.

    Key findings:

    • Top isolated artifacts by created_at DESC are rigor_score_cards and paper_figures with UUID-style PMIDs — most have no linkable targets
    • rigor_score_cards: scored_entity_id (hypothesis ID) exists in hypotheses table, but hypothesis artifact often doesn't exist; can link via hypothesis→analysis chain
    • paper_figures: UUID PMIDs don't map to paper artifacts; only numeric PMID figures (6% of numeric-PMID figures) have paper artifact targets
    • Figures (12K+ isolated) are linkable via metadata.analysis_id but appear after 3K+ paper_figures in created_at ordering
    New script: scripts/backfill_task_e6e84211.py
    • Uses created_at DESC ordering (matching task query)
    • Adds new strategies:
    - rigor_score_card: scored_entity_id → hypothesis table → analysis_idderives_from to analysis artifact (strength 1.0)
    - paper_figure: pmid → paper artifact lookup (cites, strength 0.9)
    • Fixes case-sensitivity bug in _artifact_candidates for sda-/SDA- analysis IDs
    • Scans up to 10,000 isolated artifacts to find 40 linkable ones
    Execution results:

    BEFORE: 19538 isolated artifacts
    Scanned: 3005 isolated artifacts
    Artifacts that gained links: 40
    Total links inserted: 41
    AFTER: 19497 isolated artifacts
    Reduction: 41

    Links by type:

    • rigor_score_card → analysis: 6 links (via scored_entity→hypothesis→analysis chain)
    • paper_figure → paper: 28 links (via numeric PMID matching paper artifact)
    • figure → analysis/wiki: 7 links (via metadata.analysis_id and entity_ids)
    Verification:

    -- Isolated count after run: 19497 (was 19538)
    SELECT COUNT(*) FROM artifacts a
    WHERE NOT EXISTS (
        SELECT 1 FROM artifact_links l
        WHERE l.source_artifact_id = a.id OR l.target_artifact_id = a.id
    )
    -- Result: 19497

    Acceptance criteria status:

    ☑ 40 isolated artifacts gain artifact_links edges — 40 gained links, 41 total edges
    ☑ Each link derived from scored_entity_id, pmid, metadata.analysis_id, entity_ids
    ☑ No low-confidence name-only guesses — all targets verified to exist before insert
    ☑ Before/after counts recorded — 19538 → 19497

    2026-04-22 18:07 UTC — Task 3a5b980b (slot 70)

    Problem identified: The backfill script's ordering caused it to scan thousands of isolated paper_figures (top of usage_score DESC NULLS LAST ordering with all having 0.5 usage_score) before finding linkable figure artifacts. Paper_figures with UUID PMIDs can't be linked to paper artifacts.

    Fixes applied to scripts/backfill_isolated_artifact_links.py:

  • Replaced global ordering with per-type processing: process artifacts by type in priority order (figure, notebook, analysis, hypothesis first — then paper_figure/wiki_page last).
  • Added case-insensitivity fix in _artifact_candidates for sess_SDA- prefix (stripped sess_ leaves SDA- uppercase; lowercased to sda- and added upper variant).
  • Each type is scanned in its own query with usage_score DESC NULLS LAST, citation_count DESC NULLS LAST, created_at DESC NULLS LAST ordering, limiting to --scan-limit per type.
  • Execution results:

    BEFORE: 19631 isolated artifacts
    Scanned: 96 isolated figure artifacts
    Artifacts that gained links: 50
    Total links inserted: 67
    AFTER: 19581 isolated artifacts
    Reduction: 50

    Links by type:

    • figure → analysis: derives_from via metadata.analysis_id (strength 1.0 via general metadata handling + 0.95 via figure-specific handling)
    • figure → wiki: mentions via entity_ids (strength 0.8)
    • Sample: figure-d8b07236d415 → analysis-analysis-SEAAD-20260402
    • Sample: figure-27be44fcaf91 → analysis-SDA-2026-04-16-gap-pubmed-20260411-082446-2c1c9e2d + wiki-neurodegeneration + wiki-TREM2
    Verification:

    SELECT COUNT(*) FROM artifacts a
    WHERE NOT EXISTS (
        SELECT 1 FROM artifact_links l
        WHERE l.source_artifact_id = a.id OR l.target_artifact_id = a.id
    )
    -- Result: 19581 (was 19631 before this run)

    Acceptance criteria status:

    ☑ 50 isolated artifacts gain artifact_links edges — 50 gained links, 67 total edges
    ☑ Each link derived from metadata.analysis_id, entity_ids
    ☑ No low-confidence name-only guesses — all targets verified to exist before insert
    ☑ Before/after counts recorded — 19631 → 19581

    2026-04-22 15:52 UTC — Task 0fd31858 (slot 76)

    Task: Link 50 isolated artifacts into the governance graph.

    Problem identified: The backfill script's _infer_from_metadata only looked for paper_id in metadata, but paper_figure artifacts store the PMID under the key pmid (not paper_id). Additionally, figure artifacts (15K+ isolated) were not being linked despite having metadata.analysis_id that maps to existing analysis artifacts.

    Fixes applied to scripts/backfill_isolated_artifact_links.py:

  • Added pmid handling in _infer_from_metadata: when pmid is a numeric string (not a UUID), link paper_figure to paper-{pmid} artifact with cites link type (strength 0.9).
  • Added figure artifact type handling in _infer_from_metadata: link figure artifacts to analysis via metadata.analysis_id with derives_from link type (strength 0.95).
  • Increased default --scan-limit from 500 to 5000 because top-scored isolated artifacts are mostly unlinkable paper_figures with UUID PMIDs; need to scan deeper to find linkable figures and notebooks.
  • Execution results:

    BEFORE: 19513 isolated artifacts
    Scanned: 3072 isolated artifacts (scan-limit=5000)
    Artifacts that gained links: 50
    Total links inserted: 61
    AFTER: 19462 isolated artifacts
    Reduction: 51

    Links by type:

    • figure → analysis: derives_from via metadata.analysis_id (strength 0.95) — 50 figures linked
    • figure → wiki: mentions via entity_ids (strength 0.8) — some figures also gained wiki mentions
    Sample inserted links:
    • figure-311d9d1facc8analysis-sda-2026-04-01-gap-008 (derives_from, metadata.analysis_id)
    • figure-cbaac6950f55analysis-sda-2026-04-01-gap-008, wiki-neurodegeneration (derives_from + mentions)
    • figure-e86a28c571e5analysis-sda-2026-04-01-002, wiki-GBA (derives_from + mentions)
    Verification:

    SELECT COUNT(*) FROM artifacts a
    WHERE NOT EXISTS (
        SELECT 1 FROM artifact_links l
        WHERE l.source_artifact_id = a.id OR l.target_artifact_id = a.id
    )
    -- Result: 19462 (was 19513 before this run)

    Acceptance criteria status:

    ☑ 50 isolated artifacts gain artifact_links edges — 50 gained links, 61 total edges
    ☑ Each link derived from metadata.analysis_id, entity_ids, or related DB rows
    ☑ No low-confidence name-only guesses — all targets verified to exist before insert
    ☑ Before/after counts recorded — 19513 → 19462

    2026-04-26 21:31 UTC — Task 991bbed9 (slot 42)

    Task: [Senate] Link 50 isolated artifacts into the governance graph.

    Approach: Ran scripts/backfill_isolated_artifact_links.py --limit 50 --scan-limit 500 with per-type priority ordering (figures first). Updated TASK_ID in script to 991bbed9.

    Execution results:

    BEFORE: 17035 isolated artifacts
    Scanned: 444 isolated artifacts (scan-limit=500)
    Artifacts that gained links: 50
    Total links inserted: 68
    AFTER: 16985 isolated artifacts
    Reduction: 50 (artifacts now connected to governance graph)
    0

    Links by type:

    • figure → analysis: derives_from via metadata.analysis_id (strength 0.95-1.0)
    • figure → wiki: mentions via entity_ids (strength 0.8)
    • Sample: figure-c5274ce8ccd9 → analysis artifact via metadata.analysis_id
    • Sample: figure-14e7a6600d2d → 3 links (analysis + wiki entities)
    Verification:

    BEFORE: 17035 isolated artifacts
    Scanned: 444 isolated artifacts (scan-limit=500)
    Artifacts that gained links: 50
    Total links inserted: 68
    AFTER: 16985 isolated artifacts
    Reduction: 50 (artifacts now connected to governance graph)
    1

    Acceptance criteria status:

    ☑ 50 isolated artifacts gain artifact_links edges — 50 gained links, 70 total edges
    ☑ Each link derived from metadata.analysis_id, entity_ids (verified artifact existence before insert)
    ☑ No low-confidence name-only guesses — all targets verified to exist before insert
    ☑ Before/after counts recorded — 20469 → 20419

    2026-04-26 20:42 UTC — Task c70ab59e (slot 42)

    Task: [Senate] Link 30 isolated artifacts into the provenance graph.

    Approach: Directly populated artifacts.provenance_chain (JSON text field) for 30 artifacts
    with empty/null provenance_chain. Matched artifacts to their source entities via:

    • hypothesis artifacts → hypotheses.analysis_id join
    • analysis artifacts → analyses.gap_id join
    • notebook artifacts → analyses table join (notebook-{id} → analysis id)
    Execution results:

    BEFORE: 17035 isolated artifacts
    Scanned: 444 isolated artifacts (scan-limit=500)
    Artifacts that gained links: 50
    Total links inserted: 68
    AFTER: 16985 isolated artifacts
    Reduction: 50 (artifacts now connected to governance graph)
    2

    Script: scripts/link_artifact_provenance.py

    Sample provenance chains inserted:

    • hypothesis-h-06cb8e75[{"artifact_id": "analysis-SDA-2026-04-03-gap-aging-mouse-brain-v3-20260402", "relation": "derives_from", ...}]
    • analysis-sda-2026-04-01-001[{"step": "linked_to_gap", "analysis_id": "sda-2026-04-01-001", "gap_id": "gap-001", ...}]
    • notebook-SDA-2026-04-01-gap-001[{"step": "ci_notebook_stub", "analysis_id": "SDA-2026-04-01-gap-001", ...}]
    Verification:

    BEFORE: 17035 isolated artifacts
    Scanned: 444 isolated artifacts (scan-limit=500)
    Artifacts that gained links: 50
    Total links inserted: 68
    AFTER: 16985 isolated artifacts
    Reduction: 50 (artifacts now connected to governance graph)
    3

    Acceptance criteria status:

    ☑ 30 artifacts have non-empty provenance_chain entries
    ☑ Each link derived from hypotheses.analysis_id, analyses.gap_id (exact DB joins)
    ☑ Verified by query on artifacts table

    Payload JSON
    {
      "requirements": {
        "analysis": 6,
        "reasoning": 6
      }
    }

    Sibling Tasks in Quest (Resource Governance) ↗