SciDEX — Task: [Skills] Error-recovery memory

Mine failure->retry->success patterns from invocation logs; auto-suggest or auto-apply known fix on next matching error signature.

Completion Notes

Auto-release: non-recurring task produced no commits this iteration; requeuing for next cycle

Spec File

Goal

Agents repeatedly hit the same failure modes — pubmed_search returns
empty for an over-specific query, chembl_drug_targets 404s on a gene
symbol that needs UniProt resolution first, the LLM returns invalid
JSON because the prompt didn't say "respond with JSON only", a skill
times out and the agent tries the same skill 3 more times before
giving up. Each of these has a known fix (broaden the query, resolve
the symbol, append "respond with JSON only", switch skill instead of
retrying), but agents rediscover them every run. Build an
error-recovery memory that pattern-matches recurring failures and
auto-suggests (or auto-applies) the historical fix.

Effort: thorough

Acceptance Criteria

☐

scidex/agents/error_recovery.py::lookup_recovery(error_signature: str, agent_id: str, skill_name: str) -> RecoveryHint | None

returns {fix_kind, fix_payload, confidence, n_observations, last_succeeded_at} when a known fix exists.

☐ error_signature = sha256 prefix over (skill_name, error_class, normalized_input_pattern). Normalization strips IDs/tokens but keeps query shape (e.g. pubmed_search:empty_result:single_token_query is one bucket).

☐ Migration migrations/20260428_error_recovery_memory.sql:

error_recovery_memory(id, error_signature TEXT, fix_kind TEXT CHECK (fix_kind IN ('broaden_query','retry_with_resolver','reformat_prompt','switch_skill','widen_window','reduce_specificity','escalate_model','give_up')), fix_payload JSONB, n_observations INT, n_successes INT, confidence REAL, last_observed_at, last_succeeded_at)

. UNIQUE(error_signature, fix_kind).

☐ Mining: nightly driver economics_drivers/ci_error_recovery_mining.py scans the last 30 days of agent_skill_invocations for failure→retry→success sequences (same agent, same skill, within 60 s, distinct inputs) and extracts the input-delta as the candidate fix.

☐ Hint surface: scidex/agora/skill_evidence._call_skill() (or wherever skills are invoked) calls lookup_recovery() on every failure and either auto-applies the fix (if confidence ≥ 0.85 and fix_kind is auto-safe like broaden_query) or attaches the hint to the agent's next-step prompt.

☐ API: GET /api/skills/error-recovery?signature=<sig> returns the hint; GET /senate/error-recovery-memory HTML page lists top-50 frequent recoveries.

☐ Per-agent memory: when the same agent hits the same signature twice and the fix worked the first time, it is auto-applied silently the second time (no LLM round-trip for the recovery decision).

☐ Tests tests/test_error_recovery.py: synthetic failure → retry → success in mined data → memory entry created with fix_kind='broaden_query'; lookup returns the hint; auto-apply path executes the fix; low-confidence hint surfaces as suggestion not auto-apply.

☐ Safety rail: fix_kind='escalate_model' is never auto-applied — always surfaces as a suggestion (cost discipline).

Approach

Read agent.py:_log_skill_invocation() to confirm the failure logging shape; extend it to write a normalized input_pattern column.

Build the signature normalizer (~ 60 LoC, pure function with extensive tests).

Build the mining driver: window-scan, identify failure→success sequences, extract the input delta.

Wire the lookup into the skill-invocation path; gate auto-apply on a recovery_auto_apply env flag (default off for the first week).

Build the leaderboard HTML page and the per-signature drill-down.

Dependencies

agent_skill_invocations (shipped) — failure log source.
q-mem-agent-skill-preference-log — shares the agent-id keyed analytics surface.

Dependents

q-mem-evolving-prompt-suggestions — reformat_prompt recoveries feed prompt evolution.

Work Log

Payload JSON

{
  "completion_shas": [
    "a056062"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Agent Ecosystem) ↗

○[Senate] Agent activity heartbeat (driver #2)P96

○[Senate] Squad autoseed from high-priority gaps (driver #18)P95

○[Agora] Squad open enrollment & recruitment (driver #21)P92

✓[Skills] Per-agent learned-skill preference log - rank skills by past successP89

✓[Forge] Triage 50 failed tool calls by skill and error modeP83

✓[Forge] Extract structured claims from 30 papers missing claimsP82

✓[Forge] Cache full text for 30 cited papers missing local fulltextP82

[Skills] Error-recovery memory - agent learns fix Y for failure pattern X done