Identify the 10 skills most frequently called in the last 30 days from tool_calls. For each: (1) run a representative test call with a standard neurodegeneration query; (2) measure response latency in ms; (3) score output completeness and accuracy (0–1) against a known reference; (4) compare against prior benchmark if one exists in tool_health_log; (5) INSERT a new row into tool_health_log with results. Flag tools with latency > 5s or quality < 0.7 for optimization. This establishes a baseline performance register.