[Forge] Type-checked tool wrappers - pydantic gate + standard error envelope done

← Code Health
Wrap every tool in pydantic.validate_call + ToolResult{ok,data,error}; bucket validation vs upstream errors distinctly.

Completion Notes

Auto-release: non-recurring task produced no commits this iteration; requeuing for next cycle

Git Commits (1)

[Forge] Type-checked tool wrappers — pydantic gate + standard error envelope [task:6ff1aaab-2f9c-47aa-ad10-26e87cec177b] (#781)2026-04-27
Spec File

Effort: thorough

Goal

Most of the 58+ tools wrapped in scidex/forge/tools.py and scidex/forge/forge_tools.py accept untyped **kwargs, log the call
via @log_tool_call (line 67), and return whatever the upstream API
gave back. Bad args fail late or silently — typically the upstream
returns a 400 with a string body that the tool re-raises wrapped in
"Tool failed". Operators chase symptoms instead of the real cause
(wrong arg name, wrong type, missing API key). This spec wraps every
tool in a pydantic-validated input model + a standard ToolResult{ok, data, error{code,detail,upstream}} envelope, with a
backwards-compat shim so callers that expect raw dicts keep working.

Acceptance Criteria

Standard envelope scidex/forge/tool_envelope.py:

class ToolError(BaseModel):
          code: Literal['validation','upstream','rate_limit',
                       'auth','unknown']
          detail: str
          upstream: dict | None = None
      class ToolResult(BaseModel):
          ok: bool
          data: Any | None
          error: ToolError | None

Decorator @typed_tool(input_model=Cls) wraps a tool
function so the input model is validated up-front; upstream
httpx.HTTPStatusError is caught and bucketed
(429→rate_limit, 401|403→auth, 4xx→validation,
5xx→upstream); successful return is wrapped in
ToolResult(ok=True, data=...).
Backwards compatibility. A legacy=True flag on the
decorator makes the function still return data directly
when the caller doesn't ask for the envelope (gated by
header/env). Default flips to envelope-on after migration of
≥80% of callers; track migration via a one-row table
tool_envelope_adoption(skill_id, last_legacy_call_at)
so we know when the shim is safe to drop.
Migration of 5 representative tools. Pick the five
highest-volume tools per tool_invocations.times_used:
pubmed search, semantic-scholar search, alphafold structure,
GTEx tissue expression, ChEMBL drug-targets. Each gets a
pydantic input model, the new decorator, and a regression
test that the existing call sites still get a usable result.
Logging upgrade. log_tool_call
(scidex/forge/forge_tools.py:67) gains an error_code
column on tool_invocations (migration). The senate
tool-failure dashboard now charts failures by error_code, so
"validation" failures (caller-side) are distinguishable from
"upstream" outages.
Helpful error messages. Validation errors render as
"Tool 'pubmed_search' got pmid=str expected list[str]; full
validation: ..."
— no stack trace dump in the user-visible
path; full trace logged.
Tests tests/test_typed_tools.py:
- Pass invalid kwargs → ToolResult(ok=False,
error.code='validation')
and never reaches upstream
(requests mock asserts no calls).
- Upstream 429 → ok=False, error.code='rate_limit'.
- Successful call → ok=True, data=..., and the decorator
emits a tool_invocations row with error_code IS NULL.

Approach

  • Build the envelope + decorator first, with no migration; assert
  • import doesn't break the existing test suite.
  • Migrate one tool (pubmed_search) end-to-end as the proof of
  • shape; keep its legacy callers via the legacy=True flag.
  • Add the migration for tool_invocations.error_code.
  • Migrate the other four high-volume tools.
  • Senate dashboard chart pushed in a follow-up; this spec just
  • adds the column + the data.

    Dependencies

    • scidex/forge/forge_tools.py — log_tool_call call-site.
    • q-devx-skill-scaffolder — new tools auto-use the envelope.

    Dependents

    • q-obs-trace-id-propagation — envelope is the natural place to
    carry trace_id through the response.

    Work Log

    2026-04-27 — Implementation complete

    Delivered:

  • scidex/forge/tool_envelope.py (new) — ToolError, ToolResult pydantic models;
  • @typed_tool(input_model, legacy=True) decorator; current_error_code() ContextVar
    bridge so log_tool_call can log the bucketed error code; _bucket_http_error()
    mapping 429→rate_limit, 401|403→auth, 4xx→validation, 5xx→upstream (handles both
    requests.HTTPError and httpx.HTTPStatusError via duck-typing); tool_envelope_adoption
    DB table and tool_calls.error_code column bootstrapped at import time.

  • scidex/forge/tools.py (modified) — Imports typed_tool, current_error_code,
  • pydantic. Updated log_tool_call to log error_code (from ContextVar) to
    tool_calls.error_code. Applied @typed_tool + pydantic input model to 5 tools:
    PubmedSearchInput, SemanticScholarSearchInput, GtexTissueExpressionInput,
    ChemblDrugTargetsInput, AlphafoldStructureInput.

  • Schematool_calls.error_code TEXT added; tool_envelope_adoption(skill_id,
  • last_legacy_call_at) table created.

  • tests/test_typed_tools.py (new) — 28 tests: validation failure never hits
  • upstream; 429→rate_limit; 401|403→auth; 4xx→validation; 5xx→upstream; success
    returns ToolResult(ok=True); legacy mode backward compat for all 5 tools. All passed.

    Payload JSON
    {
      "completion_shas": [
        "1dc92d178"
      ],
      "completion_shas_checked_at": ""
    }

    Sibling Tasks in Quest (Code Health) ↗