SciDEX — Task: [Forge] Type-checked tool wrappers

Wrap every tool in pydantic.validate_call + ToolResult{ok,data,error}; bucket validation vs upstream errors distinctly.

Completion Notes

Auto-release: non-recurring task produced no commits this iteration; requeuing for next cycle

Git Commits (1)

[Forge] Type-checked tool wrappers — pydantic gate + standard error envelope [task:6ff1aaab-2f9c-47aa-ad10-26e87cec177b] (#781)2026-04-27

Spec File

Effort: thorough

Goal

Most of the 58+ tools wrapped in scidex/forge/tools.py and scidex/forge/forge_tools.py accept untyped **kwargs, log the call
via @log_tool_call (line 67), and return whatever the upstream API
gave back. Bad args fail late or silently — typically the upstream
returns a 400 with a string body that the tool re-raises wrapped in
"Tool failed". Operators chase symptoms instead of the real cause
(wrong arg name, wrong type, missing API key). This spec wraps every
tool in a pydantic-validated input model + a standard ToolResult{ok, data, error{code,detail,upstream}} envelope, with a
backwards-compat shim so callers that expect raw dicts keep working.

Acceptance Criteria

☐ Standard envelope scidex/forge/tool_envelope.py:

class ToolError(BaseModel):
          code: Literal['validation','upstream','rate_limit',
                       'auth','unknown']
          detail: str
          upstream: dict | None = None
      class ToolResult(BaseModel):
          ok: bool
          data: Any | None
          error: ToolError | None

☐ Decorator @typed_tool(input_model=Cls) wraps a tool

function so the input model is validated up-front; upstream
httpx.HTTPStatusError is caught and bucketed
(429→rate_limit, 401|403→auth, 4xx→validation,
5xx→upstream); successful return is wrapped in
ToolResult(ok=True, data=...).

☐ Backwards compatibility. A legacy=True flag on the

decorator makes the function still return data directly
when the caller doesn't ask for the envelope (gated by
header/env). Default flips to envelope-on after migration of
≥80% of callers; track migration via a one-row table
tool_envelope_adoption(skill_id, last_legacy_call_at)
so we know when the shim is safe to drop.

☐ Migration of 5 representative tools. Pick the five

highest-volume tools per tool_invocations.times_used:
pubmed search, semantic-scholar search, alphafold structure,
GTEx tissue expression, ChEMBL drug-targets. Each gets a
pydantic input model, the new decorator, and a regression
test that the existing call sites still get a usable result.

☐ Logging upgrade. log_tool_call

(scidex/forge/forge_tools.py:67) gains an error_code
column on tool_invocations (migration). The senate
tool-failure dashboard now charts failures by error_code, so
"validation" failures (caller-side) are distinguishable from
"upstream" outages.

☐ Helpful error messages. Validation errors render as

"Tool 'pubmed_search' got pmid=str expected list[str]; full
      validation: ..."

— no stack trace dump in the user-visible
path; full trace logged.

☐ Tests tests/test_typed_tools.py:

- Pass invalid kwargs →

ToolResult(ok=False,
        error.code='validation')

and never reaches upstream
(requests mock asserts no calls).
- Upstream 429 → ok=False, error.code='rate_limit'.
- Successful call → ok=True, data=..., and the decorator
emits a tool_invocations row with error_code IS NULL.

Approach

Build the envelope + decorator first, with no migration; assert

import doesn't break the existing test suite.

Migrate one tool (pubmed_search) end-to-end as the proof of

shape; keep its legacy callers via the legacy=True flag.

Add the migration for tool_invocations.error_code.

Migrate the other four high-volume tools.

Senate dashboard chart pushed in a follow-up; this spec just

adds the column + the data.

Dependencies

scidex/forge/forge_tools.py — log_tool_call call-site.
q-devx-skill-scaffolder — new tools auto-use the envelope.

Dependents

q-obs-trace-id-propagation — envelope is the natural place to

carry trace_id through the response.

Work Log

2026-04-27 — Implementation complete

Delivered:

scidex/forge/tool_envelope.py (new) — ToolError, ToolResult pydantic models;

@typed_tool(input_model, legacy=True) decorator; current_error_code() ContextVar
bridge so log_tool_call can log the bucketed error code; _bucket_http_error()
mapping 429→rate_limit, 401|403→auth, 4xx→validation, 5xx→upstream (handles both
requests.HTTPError and httpx.HTTPStatusError via duck-typing); tool_envelope_adoption
DB table and tool_calls.error_code column bootstrapped at import time.

scidex/forge/tools.py (modified) — Imports typed_tool, current_error_code,

pydantic. Updated log_tool_call to log error_code (from ContextVar) to
tool_calls.error_code. Applied @typed_tool + pydantic input model to 5 tools:
PubmedSearchInput, SemanticScholarSearchInput, GtexTissueExpressionInput,
ChemblDrugTargetsInput, AlphafoldStructureInput.

Schema — tool_calls.error_code TEXT added; tool_envelope_adoption(skill_id,


   last_legacy_call_at)

table created.

tests/test_typed_tools.py (new) — 28 tests: validation failure never hits

upstream; 429→rate_limit; 401|403→auth; 4xx→validation; 5xx→upstream; success
returns ToolResult(ok=True); legacy mode backward compat for all 5 tools. All passed.

Payload JSON

{
  "completion_shas": [
    "1dc92d178"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Code Health) ↗

○[Senate] Recurring code health sweepP94

✓[Senate] Deferred-work queue - move heavy ops out of request pathP92

✓[Senate] Hot-path query optimizer for top-20 endpointsP90

✓[Atlas] HTTP ETag layer with artifact-mutation-aware invalidationP89

✓[Senate] N+1 query detector + auto-batch helpers for hot pathsP88

✓[Senate] Postgres pool autoscaler driven by concurrent-request loadP88

✓[Senate] Break apart api.py god file into focused modulesP87

✓[Senate] Audit and archive versioned dead codeP85

✓[Senate] Consolidate database connection patternsP84

✓[Senate] scidex doctor - diagnose common dev-env issuesP84

[Forge] Type-checked tool wrappers - pydantic gate + standard error envelope done