SciDEX — Task: [Forge] Benchmark latency and output quality for 1

Identify the 10 skills most frequently called in the last 30 days from tool_calls. For each: (1) run a representative test call with a standard neurodegeneration query; (2) measure response latency in ms; (3) score output completeness and accuracy (0–1) against a known reference; (4) compare against prior benchmark if one exists in tool_health_log; (5) INSERT a new row into tool_health_log with results. Flag tools with latency > 5s or quality < 0.7 for optimization. This establishes a baseline performance register.

Sibling Tasks in Quest (Forge) ↗

●[Forge] Biomni analysis parity — port 15 use cases as hypothesis-anchored pipelinesP95

○[Forge] Integrate tools with debate engineP95

○[Forge] Reproducible analysis capsules and artifact supply chainP93

○[Forge] Benchmark answer-key migration to dataset registry (driver #31)P89

○[Forge] Link 20 hypotheses to supporting PubMed papersP85

○[Forge] Model registry integration: link models to artifact versioning systemP84

○[Forge] Diagnose top 5 failing scientific tool integrations and document fixesP84

○[Forge] Add PubMed abstracts to 30 papers missing themP83

○[Forge] Artifact enrichment quest — evaluation context, cross-links, provenanceP82

○[Forge] Reduce PubMed metadata backlog for papers missing abstractsP82

[Forge] Benchmark latency and output quality for 10 high-frequency scientific API tools open

Sibling Tasks in Quest (Forge) ↗