[Skills] Per-agent learned-skill preference log - rank skills by past success done

← Agent Ecosystem
Rank skills per agent by shipped_rate from agent_skill_invocations; skill_router consults preferences; cold-start uses system rate.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Skills] Per-agent learned-skill preference log — rank skills by shipped rate [task:32babe5f-33d0-4c06-a338-eb03b8d99aee] (#894)2026-04-27
Spec File

Goal

agent_skill_invocations already logs every skill call (per agent,
per artifact, success/error, latency). But the skill router
(scidex/forge/skill_router.py) still picks the next skill to run
purely from the manifest — it has no idea that "this agent's last 30
calls to pubmed_search produced shipped artifacts 80 % of the time
but its calls to chembl_drug_targets have a 12 % shipped-to-error
ratio." Build a per-agent learned preference log that ranks skills
by shipped success rate (not raw success — success that ended in
a published artifact), updates monthly, and is consulted by the
router before each skill choice.

Effort: deep

Acceptance Criteria

scidex/agents/skill_preference.py::compute_preferences(agent_id, window_days=90) -> dict[skill_name, {shipped_rate, success_rate, n_calls, avg_latency_ms, recent_trend}] joins agent_skill_invocations to agent_skill_invocations.cited_in_artifact and to artifacts.lifecycle='confirmed' to derive shipped_rate.
recent_trend ∈ {improving, stable, declining} computed from per-week success rate slope over the window (linear regression).
☑ Migration migrations/20260428_agent_skill_preferences.sql: agent_skill_preferences(agent_id, skill_name, shipped_rate REAL, success_rate REAL, n_calls INT, avg_latency_ms INT, recent_trend TEXT, computed_at TIMESTAMPTZ); UNIQUE(agent_id, skill_name).
☑ Driver economics_drivers/ci_skill_preference_recompute.py (weekly) recomputes preferences for all active agents; idempotent, ≤ 5 min wall clock for 50 agents × 23 skills.
scidex/forge/skill_router.py::pick() accepts an agent_id parameter and, when present, sorts candidate skills by shipped_rate DESC (with success_rate as tiebreak); falls back to manifest order when no preferences exist for the agent (cold-start).
☑ Cold-start rule: an agent with n_calls < 5 for a skill uses the system-wide shipped_rate for that skill. After 5 calls, the agent's own rate takes over.
☑ API: GET /api/agents/{agent_id}/skill-preferences returns the ranked list; GET /agents/{slug}/skills HTML page renders a sortable table.
☑ Add a "Skill preference history" panel to the per-agent page (/contributor/{agent_id} dynamic page in api.py; confirmed after reviewing api_routes/).
☑ Tests tests/test_skill_preference.py: agent with 10 successful + 0 failed pubmed_search calls + 2 successful + 8 failed chembl_drug_targetspubmed_search.shipped_rate ≈ 1.0, chembl_drug_targets.shipped_rate ≈ 0.2; cold-start agent uses system rate; declining-trend agent flagged. (16 tests, all pass)
☑ Senate hook: when an agent's shipped_rate for a skill drops below 0.2 over 30 days with n_calls ≥ 10, fire a skill_demotion_review proposal (the agent may be misusing the skill).

Approach

  • Read scidex/agora/skill_evidence.py::_log_invocation() to confirm what cited_in_artifact actually means (it tracks whether a citation made it into the final artifact).
  • Write the SQL aggregation; benchmark on production data — expected ≈ 1 M invocation rows.
  • Update the router to consult preferences; document the new param in forge/skill_router.py docstring.
  • Build the HTML panel; smoke-test on 5 agents.
  • Wire the Senate hook through scidex/senate/governance.py::create_proposal().
  • Dependencies

    • agent_skill_invocations table (shipped 2026-04-27).
    • scidex/forge/skill_router.py — router to extend.
    • scidex/agents/registry.py — agent enumeration.

    Dependents

    • q-mem-error-recovery-memory — uses preferences to identify "failed calls that should redirect to a different skill".
    • q-skills-quality-leaderboard — already shipped; consumes preference data for a per-agent breakdown.

    Work Log

    2026-04-28 — Implementation (task:32babe5f-33d0-4c06-a338-eb03b8d99aee)

    All acceptance criteria implemented:

  • migrations/20260428_agent_skill_preferences.sqlagent_skill_preferences table with UNIQUE(agent_id, skill_name); indexes on agent_id, skill_name, (agent_id, shipped_rate DESC).
  • scidex/agents/skill_preference.py — Core module:
  • - compute_preferences(agent_id, window_days=90) aggregates agent_skill_invocations by persona+skill, derives shipped_rate from cited_in_artifact, computes per-week success rate series, fits linear slope for recent_trend.
    - Cold-start: n_calls < 5 → system-wide rate via _get_system_rates().
    - upsert_preferences() — writes to agent_skill_preferences (ON CONFLICT DO UPDATE).
    - get_agent_preferences() — reads persisted preferences for router use.
    - check_demotion_candidates() — fires skill_demotion_review governance proposals via scidex.senate.governance.create_proposal() for agents with shipped_rate < 0.2 over 30 days and ≥ 10 calls.

  • economics_drivers/ci_skill_preference_recompute.py — Weekly driver: enumerates active agents from agent_skill_invocations.persona, calls compute_preferences() + upsert_preferences() for each, fires demotion proposals, enforces 6-day minimum interval via ci_last_run table.
  • scidex/forge/skill_router.py — Updated pick() to accept agent_id: Optional[str]. When present and preferences exist, survivors are sorted by shipped_rate DESC + success_rate tiebreak. Cold-start (no stored prefs) falls back to cost order. _log_decision includes agent_id in reason string. New _load_agent_preferences() with 5-min in-process cache.
  • api_routes/skill_preferences.py — New route file:
  • - GET /api/agents/{agent_id}/skill-preferences — JSON ranked list (optional recompute=true for live compute).
    - GET /agents/{slug}/skills — HTML sortable table with client-side sort.

  • api.py — Wired in _skill_prefs_router via app.include_router. Added "Skill Preference History" panel with sparkbar visualization to the /contributor/{agent_id} dynamic page (loaded async via JS fetch).
  • tests/test_skill_preference.py — 16 tests covering: linreg slope, trend classification, main shipped/success rate scenario, cold-start fallback, own-rate threshold, improving/declining trend detection, demotion proposal firing, dry-run suppression, no-candidate path. All 16 pass.
  • Sibling Tasks in Quest (Agent Ecosystem) ↗