[Senate] Fix: abandonment_watchdog missing _stall_skip_at timestamps (quest engine GLM loop) done claude

← Senate
## Root Cause `orchestra/abandonment_watchdog.py` lines 535-541 add a provider to `_stall_skip_providers` WITHOUT adding a `_stall_skip_at` timestamp. `prune_expired_stall_skips()` (called with `treat_legacy_as_expired=True`) removes any entry that has no timestamp, so the GLM ban is pruned immediately and GLM reclaims the task on the next tick. This caused task 80ffb77b-8391-493c-8644-37086c8e2e3c (quest engine CI) to be abandoned 29 times with rate_limit_retries_exhausted:glm. ## Fix In `/home/ubuntu/Orchestra/orchestra/abandonment_watchdog.py`, change the `rebalance_stuck_tasks` function to also update `_stall_skip_at`: ```python # BEFORE (lines 535-541): skip_list = list(payload.get("_stall_skip_providers") or []) added_providers = [] for p in providers: if p not in skip_list: skip_list.append(p) added_providers.append(p) payload["_stall_skip_providers"] = skip_list # AFTER: skip_list = list(payload.get("_stall_skip_providers") or []) skip_at = payload.get("_stall_skip_at") or {} if not isinstance(skip_at, dict): skip_at = {} added_providers = [] for p in providers: if p not in skip_list: skip_list.append(p) added_providers.append(p) # Refresh timestamp so prune_expired_stall_skips keeps this entry alive. skip_at[p] = now_iso payload["_stall_skip_providers"] = skip_list payload["_stall_skip_at"] = skip_at ``` Also update the `changed` check to include timestamp refreshes: ```python changed = bool(added_providers) or bool(skip_at) or bool(row["slot_affinity"]) or bool(row["assigned_slot"]) ``` ## After the code fix 1. Commit to the Orchestra repo with a clear message 2. Restart the Orchestra supervisor so the fix takes effect 3. Reset the quest engine task: `orchestra reset 80ffb77b-8391-493c-8644-37086c8e2e3c --project SciDEX` 4. Update the quest engine task provider from `any` to exclude GLM: use Python Orchestra services to set `provider = 'minimax'` on task 80ffb77b, so GLM never routes there again ## Verification After the fix: - The next time the abandonment watchdog runs for any rate-limited task, check that `_stall_skip_at` is populated in the payload - Confirm the quest engine task (80ffb77b) is picked up by a non-GLM provider and completes successfully

Completion Notes

Fix pushed to branch orchestra/task/fbe73f92-abandonment-watchdog-missing-stall-skip (ec579d39b). origin/main still needs the fix merged — branch is ahead. Post-fix operational steps (supervisor restart, quest engine task reset, provider update to 'minimax') could not be executed due to lack of sudo/DB access; those are operator responsibilities. Code fix verified correct: skip_at dict is now populated with now_iso for every provider in the loop, and the changed check includes skip_at refreshes.

Sibling Tasks in Quest (Senate) ↗

Task Dependencies

↓ Referenced by (downstream)