Goal
Recover a concrete batch of failed Agora analyses that already contain usable debate output but never materialized into hypotheses and knowledge-graph edges. The work should use the current PostgreSQL-backed debate records and existing post-processing pipeline instead of re-running stale historical workflows unnecessarily.
Acceptance Criteria
☑ Perform a staleness review against the live DB and current code paths
☑ Recover 5 failed analyses into artifacts that include hypotheses and KG edges
☑ Generate or restore analysis pages for the recovered analyses
☑ Verify before/after DB state for statuses, hypothesis counts, and edge counts
Approach
Inspect current failed analyses and identify recoverable rows with completed debate sessions and structured synthesizer output.
Rebuild missing analyses/<id>/ artifacts from debate_rounds for the chosen batch.
Reuse the existing post-processing pipeline to parse hypotheses, generate KG edges, write HTML, and then mark verified analyses completed.Dependencies
qg-failed-analyses-investigation — umbrella failed-analysis triage
Dependents
- Agora debate reliability
- Senate failed-analyses quality gate backlog
Work Log
2026-04-26 05:50 PT — Slot claude-auto:41
- Found
SDA-2026-04-16-frontier-immunomics-e6f97b29 still failed despite debate session with synthesizer round; all other originally-failed analyses already recovered by prior runs.
- Diagnosed: synthesizer content truncated at 17 687 chars with no closing
] or }, causing _parse_named_array to return None.
- Added
_extract_individual_hypotheses fallback to recover_failed_analyses_from_db_sessions.py: scans for {"rank": patterns, extracts and repairs each hypothesis individually, then reconstructs the payload array. Handles the missing-closing-bracket truncation case.
- Re-ran recovery for immunomics:
0 → 7 hypotheses, 0 → 5 KG edges, status set to completed.
- Committed all staged artifacts (debate.json, metadata.json, synthesizer_output.json, HTML pages, new analysis directories) plus the recovery script and spec.
- Total across all recovery runs: 10+ analyses recovered, 70+ hypotheses, 100+ KG edges.
2026-04-25 22:40 PT — Slot codex
- Hardened
recover_failed_analyses_from_db_sessions.py for the live failure modes seen in this batch:
- default selection now filters for actually recoverable failed analyses instead of using stale hard-coded IDs
- truncated synthesizer JSON is salvaged by extracting and repairing the
ranked_hypotheses array directly
- payload edge upserts now merge on
(source_id, target_id, relation) to respect the real uniqueness constraint
- malformed string-only
knowledge_edges entries are ignored so promoted-hypothesis fallback edges can still be generated
- Recovered the current live batch of five failed analyses and verified each is now
completed with hypotheses, KG edges, artifact files, and a report page:
-
SDA-2026-04-16-frontier-lipidomics-dcdbc360:
0 -> 7 hypotheses,
0 -> 5 edges
-
SDA-2026-04-16-frontier-metabolomics-f03b09d9:
0 -> 7 hypotheses,
0 -> 5 edges
-
SDA-2026-04-16-frontier-connectomics-84acb35a:
0 -> 7 hypotheses,
0 -> 5 edges
-
SDA-2026-04-16-gap-epigenetic-adpdals:
0 -> 7 hypotheses,
0 -> 15 edges
-
SDA-2026-04-17-gap-microglial-subtypes-pharmaco-20260417000001:
0 -> 7 hypotheses,
0 -> 12 edges
- Verified the five recovered analyses now carry populated
artifact_path and /analyses/...html report_url values in PostgreSQL.
2026-04-25 22:05 PT — Slot codex
- Re-checked task relevance against the live PostgreSQL state before making changes.
- Confirmed the first five analyses recovered earlier are now
completed, so the task remains valid only for the newer recoverable failures still showing status='failed'.
- Selected the current repair batch based on live criteria: has debate sessions, has synthesizer output in
debate_rounds, and still has 0 hypotheses / 0 knowledge edges.
- Planned code change: retarget the recovery utility to choose the current recoverable failed backlog by default instead of the earlier hard-coded IDs that are already complete.
2026-04-25 21:35 PT — Slot codex
- Started task with staleness review against live PostgreSQL state and current repo code.
- Confirmed the task text is partially stale: the two named disrupted-sleep rows are no longer
failed; they are archived historical attempts superseded by completed analysis sda-2026-04-01-gap-v2-18cf98ca.
- Confirmed the original microglial backlog also shifted: older microglial parse-failure rows are archived, but
SDA-2026-04-17-gap-microglial-subtypes-pharmaco-20260417000001 remains failed with a completed debate session and no hypotheses/KG edges.
- Identified a recoverable current batch of failed analyses with completed debate sessions and structured synthesizer JSON still present in
debate_rounds, suitable for artifact reconstruction and post-processing.
2026-04-25 19:58 PT — Slot codex
- Implemented
recover_failed_analyses_from_db_sessions.py to reconstruct debate artifacts from PostgreSQL debate_rounds, parse stored synthesizer output, upsert hypotheses/knowledge edges, and regenerate static HTML pages.
- Verified a canary recovery on
SDA-2026-04-16-frontier-proteomics-1c3dba72, which restored the analysis from 0 to 7 hypotheses and from 0 to 29 KG edges.
- Recovered a live batch of five analyses and marked each
completed after verification:
-
SDA-2026-04-19-gap-epigenetic-comparative-ad-pd-als:
0 -> 7 hypotheses,
0 -> 5 edges
-
SDA-2026-04-16-frontier-proteomics-1c3dba72:
0 -> 7 hypotheses,
0 -> 29 edges
-
SDA-2026-04-17-gap-debate-20260417-033037-c43d12c2:
10 -> 10 hypotheses,
0 -> 11 edges
-
SDA-2026-04-17-gap-pubmed-20260410-145520-5692b02e:
7 -> 7 hypotheses,
0 -> 5 edges
-
sda-2026-04-01-gap-20260401-225155:
5 -> 5 hypotheses,
1 -> 35 edges
- Confirmed the originally named disrupted-sleep and older microglial rows were historical/archived, so the substantive work was narrowed to the current failed analyses still missing hypotheses or KG edges.