[Agora] Reprocess 5 failed analyses

Goal

Recover a concrete batch of failed Agora analyses that already contain usable debate output but never materialized into hypotheses and knowledge-graph edges. The work should use the current PostgreSQL-backed debate records and existing post-processing pipeline instead of re-running stale historical workflows unnecessarily.

Acceptance Criteria

☑ Perform a staleness review against the live DB and current code paths

☑ Recover 5 failed analyses into artifacts that include hypotheses and KG edges

☑ Generate or restore analysis pages for the recovered analyses

☑ Verify before/after DB state for statuses, hypothesis counts, and edge counts

Approach

Inspect current failed analyses and identify recoverable rows with completed debate sessions and structured synthesizer output.

Rebuild missing analyses/<id>/ artifacts from debate_rounds for the chosen batch.

Reuse the existing post-processing pipeline to parse hypotheses, generate KG edges, write HTML, and then mark verified analyses completed.

Dependencies

qg-failed-analyses-investigation — umbrella failed-analysis triage

Dependents

Agora debate reliability
Senate failed-analyses quality gate backlog

Work Log

2026-04-26 05:50 PT — Slot claude-auto:41

Found SDA-2026-04-16-frontier-immunomics-e6f97b29 still failed despite debate session with synthesizer round; all other originally-failed analyses already recovered by prior runs.
Diagnosed: synthesizer content truncated at 17 687 chars with no closing ] or }, causing _parse_named_array to return None.
Added _extract_individual_hypotheses fallback to recover_failed_analyses_from_db_sessions.py: scans for {"rank": patterns, extracts and repairs each hypothesis individually, then reconstructs the payload array. Handles the missing-closing-bracket truncation case.
Re-ran recovery for immunomics: 0 → 7 hypotheses, 0 → 5 KG edges, status set to completed.
Committed all staged artifacts (debate.json, metadata.json, synthesizer_output.json, HTML pages, new analysis directories) plus the recovery script and spec.
Total across all recovery runs: 10+ analyses recovered, 70+ hypotheses, 100+ KG edges.

2026-04-25 22:40 PT — Slot codex

Hardened recover_failed_analyses_from_db_sessions.py for the live failure modes seen in this batch:

- default selection now filters for actually recoverable failed analyses instead of using stale hard-coded IDs
- truncated synthesizer JSON is salvaged by extracting and repairing the ranked_hypotheses array directly
- payload edge upserts now merge on (source_id, target_id, relation) to respect the real uniqueness constraint
- malformed string-only knowledge_edges entries are ignored so promoted-hypothesis fallback edges can still be generated

Recovered the current live batch of five failed analyses and verified each is now completed with hypotheses, KG edges, artifact files, and a report page:

- SDA-2026-04-16-frontier-lipidomics-dcdbc360: 0 -> 7 hypotheses, 0 -> 5 edges
- SDA-2026-04-16-frontier-metabolomics-f03b09d9: 0 -> 7 hypotheses, 0 -> 5 edges
- SDA-2026-04-16-frontier-connectomics-84acb35a: 0 -> 7 hypotheses, 0 -> 5 edges
- SDA-2026-04-16-gap-epigenetic-adpdals: 0 -> 7 hypotheses, 0 -> 15 edges
- SDA-2026-04-17-gap-microglial-subtypes-pharmaco-20260417000001: 0 -> 7 hypotheses, 0 -> 12 edges

Verified the five recovered analyses now carry populated artifact_path and /analyses/...html report_url values in PostgreSQL.

2026-04-25 22:05 PT — Slot codex

Re-checked task relevance against the live PostgreSQL state before making changes.
Confirmed the first five analyses recovered earlier are now completed, so the task remains valid only for the newer recoverable failures still showing status='failed'.
Selected the current repair batch based on live criteria: has debate sessions, has synthesizer output in debate_rounds, and still has 0 hypotheses / 0 knowledge edges.
Planned code change: retarget the recovery utility to choose the current recoverable failed backlog by default instead of the earlier hard-coded IDs that are already complete.

2026-04-25 21:35 PT — Slot codex

Started task with staleness review against live PostgreSQL state and current repo code.
Confirmed the task text is partially stale: the two named disrupted-sleep rows are no longer failed; they are archived historical attempts superseded by completed analysis sda-2026-04-01-gap-v2-18cf98ca.
Confirmed the original microglial backlog also shifted: older microglial parse-failure rows are archived, but SDA-2026-04-17-gap-microglial-subtypes-pharmaco-20260417000001 remains failed with a completed debate session and no hypotheses/KG edges.
Identified a recoverable current batch of failed analyses with completed debate sessions and structured synthesizer JSON still present in debate_rounds, suitable for artifact reconstruction and post-processing.

2026-04-25 19:58 PT — Slot codex

Implemented recover_failed_analyses_from_db_sessions.py to reconstruct debate artifacts from PostgreSQL debate_rounds, parse stored synthesizer output, upsert hypotheses/knowledge edges, and regenerate static HTML pages.
Verified a canary recovery on SDA-2026-04-16-frontier-proteomics-1c3dba72, which restored the analysis from 0 to 7 hypotheses and from 0 to 29 KG edges.
Recovered a live batch of five analyses and marked each completed after verification:

- SDA-2026-04-19-gap-epigenetic-comparative-ad-pd-als: 0 -> 7 hypotheses, 0 -> 5 edges
- SDA-2026-04-16-frontier-proteomics-1c3dba72: 0 -> 7 hypotheses, 0 -> 29 edges
- SDA-2026-04-17-gap-debate-20260417-033037-c43d12c2: 10 -> 10 hypotheses, 0 -> 11 edges
- SDA-2026-04-17-gap-pubmed-20260410-145520-5692b02e: 7 -> 7 hypotheses, 0 -> 5 edges
- sda-2026-04-01-gap-20260401-225155: 5 -> 5 hypotheses, 1 -> 35 edges

Confirmed the originally named disrupted-sleep and older microglial rows were historical/archived, so the substantive work was narrowed to the current failed analyses still missing hypotheses or KG edges.

File: q01-a3-024044CF_spec.md

Modified: 2026-04-25 22:52

Size: 6.2 KB