[Senate] Task reaper in Orchestra watchdog
Goal
Orchestra already had a stale task reaper (
requeue_stale_running_tasks) but it only ran when an agent called
orchestra task get-next. If no agent was running (as happened with SciDEX for 31+ hours), stuck tasks accumulated indefinitely. This task adds the reaper to the watchdog cycle (every 5 min via cron) and provides a standalone CLI command.
Changes Made
1. Orchestra CLI (scripts/orchestra_cli.py)
- Added
cmd_reap() function: orchestra task reap [--project P] [--timeout S] [--dry-run]
- Reaps tasks stuck in 'running' longer than timeout (default 30 min)
- Supports per-project filtering and dry-run mode
- Registered in command dispatch table
2. Orchestra Watchdog (scripts/orchestra_watchdog.py)
- Added
reap_stale_tasks() method to OrchestraWatchdog class
- Integrated into
run() cycle as step 3b (before queue starvation check)
- Runs for ALL active projects including SciDEX
- Added SciDEX to
PROJECT_DIR_MAP and PROJECT_PATH_MAP
- Added
SYSTEMD_MANAGED_PROJECTS set — projects using systemd services skip tmux supervisor management but still get reaper + queue checks
- Installed watchdog cron (every 5 min)
3. Immediate Fixes Applied
- Reset 27 orphaned running tasks to 'open'
- Re-opened 11 knowledge gaps (10 partially_filled + 1 investigating)
- Enabled + started scidex-improve service
- Restarted scidex-agent (now actively investigating gaps)
Acceptance Criteria
☑ orchestra task reap CLI command works
☑ orchestra task reap --dry-run shows what would be reaped
☑ orchestra task reap --project SciDEX filters by project
☑ Watchdog reaps stale tasks every 5 minutes via cron
☑ SciDEX in watchdog project map with correct directory path
☑ SciDEX supervisor management skipped (systemd-managed)
☑ Watchdog cron installed and running
Dependencies
Work Log
2026-04-03 22:30 UTC
- Diagnosed 7 root causes for agent stall (31+ hours idle)
- Key finding: reaper existed but only ran on get-next; nobody calling get-next for SciDEX
- Implemented standalone reaper CLI + watchdog integration
- Fixed PROJECT_DIR_MAP for SciDEX (case-sensitive path)
- Added SYSTEMD_MANAGED_PROJECTS to skip tmux supervisor for SciDEX
- Installed watchdog cron
- Verified agent is back online and investigating gaps