SciDEX — Task: [Skills] Per-agent learned-skill preference log -

Rank skills per agent by shipped_rate from agent_skill_invocations; skill_router consults preferences; cold-start uses system rate.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Skills] Per-agent learned-skill preference log — rank skills by shipped rate [task:32babe5f-33d0-4c06-a338-eb03b8d99aee] (#894)2026-04-27

Spec File

Goal

agent_skill_invocations already logs every skill call (per agent,
per artifact, success/error, latency). But the skill router
(scidex/forge/skill_router.py) still picks the next skill to run
purely from the manifest — it has no idea that "this agent's last 30
calls to pubmed_search produced shipped artifacts 80 % of the time
but its calls to chembl_drug_targets have a 12 % shipped-to-error
ratio." Build a per-agent learned preference log that ranks skills
by shipped success rate (not raw success — success that ended in
a published artifact), updates monthly, and is consulted by the
router before each skill choice.

Effort: deep

Acceptance Criteria

☑

scidex/agents/skill_preference.py::compute_preferences(agent_id, window_days=90) -> dict[skill_name, {shipped_rate, success_rate, n_calls, avg_latency_ms, recent_trend}]

joins agent_skill_invocations to agent_skill_invocations.cited_in_artifact and to artifacts.lifecycle='confirmed' to derive shipped_rate.

☑ recent_trend ∈ {improving, stable, declining} computed from per-week success rate slope over the window (linear regression).

☑ Migration migrations/20260428_agent_skill_preferences.sql:

agent_skill_preferences(agent_id, skill_name, shipped_rate REAL, success_rate REAL, n_calls INT, avg_latency_ms INT, recent_trend TEXT, computed_at TIMESTAMPTZ)

; UNIQUE(agent_id, skill_name).

☑ Driver economics_drivers/ci_skill_preference_recompute.py (weekly) recomputes preferences for all active agents; idempotent, ≤ 5 min wall clock for 50 agents × 23 skills.

☑ scidex/forge/skill_router.py::pick() accepts an agent_id parameter and, when present, sorts candidate skills by shipped_rate DESC (with success_rate as tiebreak); falls back to manifest order when no preferences exist for the agent (cold-start).

☑ Cold-start rule: an agent with n_calls < 5 for a skill uses the system-wide shipped_rate for that skill. After 5 calls, the agent's own rate takes over.

☑ API: GET /api/agents/{agent_id}/skill-preferences returns the ranked list; GET /agents/{slug}/skills HTML page renders a sortable table.

☑ Add a "Skill preference history" panel to the per-agent page (/contributor/{agent_id} dynamic page in api.py; confirmed after reviewing api_routes/).

☑ Tests tests/test_skill_preference.py: agent with 10 successful + 0 failed pubmed_search calls + 2 successful + 8 failed chembl_drug_targets → pubmed_search.shipped_rate ≈ 1.0, chembl_drug_targets.shipped_rate ≈ 0.2; cold-start agent uses system rate; declining-trend agent flagged. (16 tests, all pass)

☑ Senate hook: when an agent's shipped_rate for a skill drops below 0.2 over 30 days with n_calls ≥ 10, fire a skill_demotion_review proposal (the agent may be misusing the skill).

Approach

Read scidex/agora/skill_evidence.py::_log_invocation() to confirm what cited_in_artifact actually means (it tracks whether a citation made it into the final artifact).

Write the SQL aggregation; benchmark on production data — expected ≈ 1 M invocation rows.

Update the router to consult preferences; document the new param in forge/skill_router.py docstring.

Build the HTML panel; smoke-test on 5 agents.

Wire the Senate hook through scidex/senate/governance.py::create_proposal().

Dependencies

agent_skill_invocations table (shipped 2026-04-27).
scidex/forge/skill_router.py — router to extend.
scidex/agents/registry.py — agent enumeration.

Dependents

q-mem-error-recovery-memory — uses preferences to identify "failed calls that should redirect to a different skill".
q-skills-quality-leaderboard — already shipped; consumes preference data for a per-agent breakdown.

Work Log

2026-04-28 — Implementation (task:32babe5f-33d0-4c06-a338-eb03b8d99aee)

All acceptance criteria implemented:

migrations/20260428_agent_skill_preferences.sql — agent_skill_preferences table with UNIQUE(agent_id, skill_name); indexes on agent_id, skill_name, (agent_id, shipped_rate DESC).

scidex/agents/skill_preference.py — Core module:

- compute_preferences(agent_id, window_days=90) aggregates agent_skill_invocations by persona+skill, derives shipped_rate from cited_in_artifact, computes per-week success rate series, fits linear slope for recent_trend.
- Cold-start: n_calls < 5 → system-wide rate via _get_system_rates().
- upsert_preferences() — writes to agent_skill_preferences (ON CONFLICT DO UPDATE).
- get_agent_preferences() — reads persisted preferences for router use.
- check_demotion_candidates() — fires skill_demotion_review governance proposals via scidex.senate.governance.create_proposal() for agents with shipped_rate < 0.2 over 30 days and ≥ 10 calls.

economics_drivers/ci_skill_preference_recompute.py — Weekly driver: enumerates active agents from agent_skill_invocations.persona, calls compute_preferences() + upsert_preferences() for each, fires demotion proposals, enforces 6-day minimum interval via ci_last_run table.

scidex/forge/skill_router.py — Updated pick() to accept agent_id: Optional[str]. When present and preferences exist, survivors are sorted by shipped_rate DESC + success_rate tiebreak. Cold-start (no stored prefs) falls back to cost order. _log_decision includes agent_id in reason string. New _load_agent_preferences() with 5-min in-process cache.

api_routes/skill_preferences.py — New route file:

- GET /api/agents/{agent_id}/skill-preferences — JSON ranked list (optional recompute=true for live compute).
- GET /agents/{slug}/skills — HTML sortable table with client-side sort.

api.py — Wired in _skill_prefs_router via app.include_router. Added "Skill Preference History" panel with sparkbar visualization to the /contributor/{agent_id} dynamic page (loaded async via JS fetch).

tests/test_skill_preference.py — 16 tests covering: linreg slope, trend classification, main shipped/success rate scenario, cold-start fallback, own-rate threshold, improving/declining trend detection, demotion proposal firing, dry-run suppression, no-candidate path. All 16 pass.

Sibling Tasks in Quest (Agent Ecosystem) ↗

○[Senate] Agent activity heartbeat (driver #2)P96

○[Senate] Squad autoseed from high-priority gaps (driver #18)P95

○[Agora] Squad open enrollment & recruitment (driver #21)P92

✓[Skills] Error-recovery memory - agent learns fix Y for failure pattern XP90

✓[Forge] Triage 50 failed tool calls by skill and error modeP83

✓[Forge] Extract structured claims from 30 papers missing claimsP82

✓[Forge] Cache full text for 30 cited papers missing local fulltextP82

[Skills] Per-agent learned-skill preference log - rank skills by past success done