Effort: thorough
Replaying a debate today re-calls the LLM with temperature>0 and
gets a different response — so "rerun this artifact" can never be
byte-identical when the chain includes any LLM step. The
deterministic-replay sandbox handles code; this spec handles the
LLM. Build a content-addressed prompt/response cache keyed on
sha256(model + system_prompt + messages + tool_defs + temperature — when the cache hits, the exact prior response is
+ seed)
returned without calling the upstream. When it misses, the call
proceeds and the response is stored.
migrations/<YYYYMMDD>_create_llm_response_cache.sql:CREATE TABLE llm_response_cache (
cache_key text PRIMARY KEY, -- sha256(...)
provider text NOT NULL,
model text NOT NULL,
request_blob jsonb NOT NULL, -- full request normalised
response_blob jsonb NOT NULL, -- raw provider response
usage jsonb, -- prompt_tokens etc
cached_at timestamptz DEFAULT now(),
last_hit_at timestamptz,
hit_count int DEFAULT 0
);
CREATE INDEX idx_llm_cache_model ON llm_response_cache(model);
CREATE INDEX idx_llm_cache_cached_at ON llm_response_cache(cached_at);scidex/core/llm_cache.py:compute_cache_key(provider, model, messages, tools,
temperature, seed) -> str — canonical JSON normalisation,get(key) -> dict | None and put(key, request, response,
usage).wrap_llm_call(fn) decorator: looks up the cache; on hithit_count +last_hit_at; on miss calls the wrapped function andscidex/core/llm.py (the LiteLLMcomplete() / stream() paths gain acache: Literal['off','read','write','both'] kwargSCIDEX_LLM_CACHE_MODE (default "off" inforge/runtime.py (from q-sand-deterministic-replay) flipsboth, so replays automatically hit cache.
scripts/llm_cache_evict.py removes entries older than 90hit_count > 0 (never-replayed entries areGET /senate/llm-cache-stats returnsscidex/forge/cost_budget.py).
SCIDEX_NOCACHE marker (for sensitiveAGENTS.md Skillstests/test_llm_cache.py:cache='off' bypasses the cache entirely.hit_count.wrap_llm_call decorator; smoke against scidex/core/llm.pyforge/runtime.py minimal.
/senate/quality-dashboard shell.scidex/core/llm.py — LiteLLM facade.q-sand-deterministic-replay — env-var flag flip.q-repro-rerun-artifact — gives byte-identical replay over theCompleted all acceptance criteria:
migrations/132_add_llm_response_cache.py): CREATE TABLE llm_response_cache with all spec columns + indexes. Applied successfully.scidex/core/llm_cache.py: compute_cache_key (SHA256, canonical JSON), get (update hit_count), put (upsert), wrap_llm_call decorator with SCIDEX_NOCACHE privacy guard, mode helpers, _cache_mode() reading env var.scidex/core/llm.py: complete() and complete_with_tools() each gain cache: Literal["off","read","write","both"] = None kwarg defaulting to SCIDEX_LLM_CACHE_MODE env var.forge/runtime.py: SCIDEX_LLM_CACHE_MODE = "both" set in deterministic run_env.scripts/llm_cache_evict.py: Nightly eviction, respects hit_count > 0, configurable TTL via env.api_routes/senate.py: GET /api/senate/llm-cache-stats returns total entries, hit-rate 7d, top-10, bytes, est. $ saved.AGENTS.md: Documented SCIDEX_NOCACHE marker in Skills section.tests/test_llm_cache.py: 16 tests, all passing. Covers: same input→same key, different model/temp/msgs→different key, dict key order normalisation, NOCACHE bypass, cache=off bypass, hit count increment.Files created (6 new):
migrations/132_add_llm_response_cache.pyscidex/core/llm_cache.pyscripts/llm_cache_evict.pytests/test_llm_cache.pyscidex/core/llm.py — added cache kwarg to complete() and complete_with_tools()forge/runtime.py — added SCIDEX_LLM_CACHE_MODE = "both" for deterministic runsapi_routes/senate.py — added /api/senate/llm-cache-stats endpointAGENTS.md — documented SCIDEX_NOCACHE privacy marker