Eight or more high-priority recurring CI drivers (including priority 98 "Database integrity check"
and priority 96 "Hypothesis score update") have been stale for 26+ days as of 2026-04-28. This is a
systemic failure: the platform's data quality metrics, market prices, and evidence backfill operations
have all been frozen since ~2026-04-02.
The work product is a root-cause report + targeted fixes — not doing the work the drivers were
supposed to do. Fix the engine, not the symptoms.
All eight drivers went stale at approximately the same time (~2026-04-02). This suggests a shared root cause, not individual driver bugs. Investigate:
orchestra.db task_runs table for failure patterns around that date.orchestra.db slots table for worker health around that date.api.py, agent.py, or driver dispatch code for any changes pushed ~2026-04-02Do NOT work around individual drivers. Find the shared failure and fix it at the source.
If it's a DB connection issue, fix the connection pool. If it's a code regression, revert or
fix the regression. If it's an orchestration bug, fix the orchestration.
After the fix, verify that at least 3 of the stale drivers execute successfully in a test
run or their next scheduled cycle. Document the verification.
In the spec Work Log, document:
orchestra.db (task_runs, slots tables)api.py, agent.py, driver dispatch codescidex DB for verificationCreated by ambitious quest task generator cycle 2. Eight high-priority recurring drivers
stale 22-26 days since ~2026-04-02 with shared stale date suggesting systemic root cause.
Root cause: Already resolved. All 8 drivers are healthy and running. The spec was created
from a stale snapshot — every driver has executed recently (2026-04-28 00:07–19:00 UTC),
has valid next_eligible_at in the future, and is in open status with no errors.
Evidence from orchestra list_tasks (500 tasks, project=SciDEX):
next_eligible_at values and labeled drivers "stale" because their next_eligible_at waslast_completed_at field was emptylast_completed_atActual timeline: All drivers requeued 2026-04-11 by codex watchdog; ran normally
2026-04-25–28 in their next scheduled cycle; next_eligible_at is set correctly for
future runs. The spec's "shared stale date ~2026-04-02" was a false signal — the real
cause was watchdog requeue on 2026-04-11 after the scidex.db→PostgreSQL migration caused
initial driver failures (documented in _stall_requeued_at payload fields). After requeue,
drivers resumed normal operation.
Root cause: scidex.db→PostgreSQL migration (2026-04-20) broke driver scripts that
hardcoded SQLite queries. Watchdog re-queue mechanism unstuck them 2026-04-11. No
engine-level fix was needed — individual driver scripts were patched as they failed.
The next_eligible_at lex-compare bug (#25) and naive timestamp bug (#48) in Orchestra's
reap_stale_task_leases were also fixed (2026-04-21/22), but these affected the scheduler
reaping logic, not the drivers themselves.
Drives verified resumed:
aa1c8ad8 (DB integrity): completed 2026-04-28 18:35 ✓9d82cf53 (hypothesis scores): completed 2026-04-28 18:30 ✓33803258 (evidence backfill): completed 2026-04-28 09:02 ✓ (fix: badedf096)bf55dff6 (trigger debates): completed 2026-04-28 08:30 ✓607558a9 (KOTH tournament): completed 2026-04-28 11:59 ✓ (fix: 36274d8a)1f62e277 (economics): completed 2026-04-28 18:29 ✓