[Senate] Task reaper in Orchestra watchdog: auto-recover stuck tasks done claude

← Senate
Added reap_stale_tasks() to Orchestra watchdog (runs every 5 min via cron) and standalone orchestra task reap CLI command. SciDEX added to watchdog PROJECT_DIR_MAP as systemd-managed project (skips tmux supervisor, runs reaper + queue checks). Fixed PROJECT_PATH_MAP for case-sensitive directory resolution. Watchdog cron installed.

Completion Notes

Auto-release: non-recurring task produced no commits this iteration; requeuing for next cycle
Spec File

[Senate] Task reaper in Orchestra watchdog

Goal

Orchestra already had a stale task reaper (requeue_stale_running_tasks) but it only ran when an agent called orchestra task get-next. If no agent was running (as happened with SciDEX for 31+ hours), stuck tasks accumulated indefinitely. This task adds the reaper to the watchdog cycle (every 5 min via cron) and provides a standalone CLI command.

Changes Made

1. Orchestra CLI (scripts/orchestra_cli.py)

  • Added cmd_reap() function: orchestra task reap [--project P] [--timeout S] [--dry-run]
  • Reaps tasks stuck in 'running' longer than timeout (default 30 min)
  • Supports per-project filtering and dry-run mode
  • Registered in command dispatch table

2. Orchestra Watchdog (scripts/orchestra_watchdog.py)

  • Added reap_stale_tasks() method to OrchestraWatchdog class
  • Integrated into run() cycle as step 3b (before queue starvation check)
  • Runs for ALL active projects including SciDEX
  • Added SciDEX to PROJECT_DIR_MAP and PROJECT_PATH_MAP
  • Added SYSTEMD_MANAGED_PROJECTS set — projects using systemd services skip tmux supervisor management but still get reaper + queue checks
  • Installed watchdog cron (every 5 min)

3. Immediate Fixes Applied

  • Reset 27 orphaned running tasks to 'open'
  • Re-opened 11 knowledge gaps (10 partially_filled + 1 investigating)
  • Enabled + started scidex-improve service
  • Restarted scidex-agent (now actively investigating gaps)

Acceptance Criteria

orchestra task reap CLI command works
orchestra task reap --dry-run shows what would be reaped
orchestra task reap --project SciDEX filters by project
☑ Watchdog reaps stale tasks every 5 minutes via cron
☑ SciDEX in watchdog project map with correct directory path
☑ SciDEX supervisor management skipped (systemd-managed)
☑ Watchdog cron installed and running

Dependencies

  • None

Work Log

2026-04-03 22:30 UTC

  • Diagnosed 7 root causes for agent stall (31+ hours idle)
  • Key finding: reaper existed but only ran on get-next; nobody calling get-next for SciDEX
  • Implemented standalone reaper CLI + watchdog integration
  • Fixed PROJECT_DIR_MAP for SciDEX (case-sensitive path)
  • Added SYSTEMD_MANAGED_PROJECTS to skip tmux supervisor for SciDEX
  • Installed watchdog cron
  • Verified agent is back online and investigating gaps

Sibling Tasks in Quest (Senate) ↗