Calibrate performance scores for registered skills that currently have no meaningful score. Skill scores support routing, maintenance, and tool-playground exposure.
q-cc0888c0004a - Agent Ecosystem questbec30a01-e196-4d26-a051-e9e808b95146 ran benchmark on 26 unscored skills.tool_calls telemetry: formula = 0.5 + 0.3success_rate + 0.2speed_factor.0267ccb80 ([Forge] Benchmark 25 registered skills by performance and accuracy [task:bec30a01-e196-4d26-a051-e9e808b95146]).85d9ad14-3519-4ebd-9e5e-e4189ac7b2e8 was stale by the time this slot started: live PostgreSQL verification through scidex.core.database.get_db() found 282 registered skills and 0 skills with performance_score IS NULL OR performance_score = 0.tool_calls has 28,280 rows with 27,889 successes, 390 errors, and 1544.2 ms average nonzero latency.skills.code_path; the primary tools.py path has 112 scored skills and exists in-repo, and registered forge/skills/*/SKILL.md paths sampled from the scored registry exist.eb7917ecf ([Forge] Score performance for 25 unscored registered skills [task:c82d378b-5192-4823-9193-939dd71935d1]) plus verification commit 70fbe70a2 ([Verify] Skill scoring already resolved — 26→1 unscored [task:c82d378b-5192-4823-9193-939dd71935d1]) addressed the scoring backlog; current live DB now has 0 remaining unscored skills, satisfying this task's <= 0 verification target.score = 0.40 success_rate + 0.35 test_coverage + 0.25 * speed_factor
tests/):tool_pubmed_search, tool_paper_figures (dedicated test files)tool_research_topic (success/failure paths in test_agora_orchestrator_tools.py)tool_gtex_tissue_expression, tool_open_targets_associations, tool_semantic_scholar_search (indirect mock references)tool_calls telemetry (per-skill errors / total calls)tool_pubchem_compound 1.000 → 0.561 (39.7% error rate, no tests)tool_msigdb_gene_sets 1.000 → 0.574 (34.0% error rate, no tests)tool_expression_atlas_differential 1.000 → 0.587 (15.0% err, 3.4 s avg, no tests)tool_brainspan_expression 1.000 → 0.665 (13.7% error rate, no tests)tool_mgi_mouse_models 1.000 → 0.655 (16.3% error rate, no tests)tool_pubmed_search 1.000 → 0.996 (0.3% error rate, full test coverage) ✓
score_skills_by_coverage_and_errors.py (checked in)