Effort: thorough
Several hot routes in api.py do work that doesn't need to happen
inside the request: re-running mermaid linting on a wiki edit,
invalidating mat views on artifact commit, recomputing
q-impact-citation-tracker attribution graphs after a comment,
sending Slack/Notion webhooks (q-integ-notion-slack-webhooks).
These add 100-2000 ms to request latency and burn pool connections
during the wait. Ship a lightweight Postgres-backed deferred-work
queue (no Redis dep) that handlers can enqueue(task, payload)
on and return immediately.
migrations/<date>_deferred_work_queue.sql:CREATE TABLE deferred_jobs (
id BIGSERIAL PRIMARY KEY,
task TEXT NOT NULL,
payload JSONB NOT NULL,
priority INT NOT NULL DEFAULT 5,
run_at TIMESTAMP NOT NULL DEFAULT NOW(),
attempts INT NOT NULL DEFAULT 0,
max_attempts INT NOT NULL DEFAULT 5,
locked_at TIMESTAMP,
locked_by TEXT,
completed_at TIMESTAMP,
last_error TEXT,
trace_id TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX ix_deferred_jobs_pickup ON deferred_jobs
(run_at) WHERE completed_at IS NULL AND locked_at IS NULL;scidex/core/deferred.py:enqueue(task: str, payload: dict, priority: int=5,
run_at: datetime|None=None) -> int — returns job id,register(name) decorator — registers a Python callabletask=name.claim(worker_id, batch=10) -> list[Job] — usesSELECT ... FOR UPDATE SKIP LOCKED to atomically claimscripts/deferred_work_worker.py:db_writes.py).api_routes/senate.py save handler).q-dsc-comments-on-hypothesis-pages).
GET /senate/deferred-work shows queuetests/test_deferred_work.py: enqueue,SKIP LOCKED correctness (no double-claim),SKIP LOCKED handles contention).
?inline=1 flag for quick rollback.
q-obs-trace-id-propagation — for trace_id on every job.q-perf-selective-mat-views — main consumer for refreshq-integ-notion-slack-webhooks — runs as deferred jobs.q-integ-bluesky-publish-pipeline — same.Approach taken:
The GitHub-sync task (5cff56ac) had already created a deferred_work table and a
basic scidex/core/deferred.py. This task extends and formalises that into the
spec-compliant design.
Files changed:
migrations/20260427_deferred_work_queue.sql — Creates deferred_jobs tabledeferred_work table.scidex/core/deferred.py — Rewrote to use deferred_jobs (was deferred_work).enqueue no longer requires handler registration in the callingclaim returns typed Jobrequeue uses exponential backoff [30s, 5m, 30m, 4h, 24h];SCIDEX_DEFERRED_INLINE=1 env var runs jobs inline for tests/rollback.
scripts/deferred_work_worker.py — New daemon: configurable concurrency--once drain mode, registers three handlersmermaid_lint, artifact_matview_refresh, citation_attribution_recompute),scidex/core/db_writes.py (_mermaid_check_content_md) — Deferred by default;SCIDEX_MERMAID_INLINE=1 or enqueue fails. Keepsscidex/atlas/artifact_commit.py (commit_artifact) — After a successful gitartifact_matview_refresh with priority 6.
api.py (/api/comments POST handler) — After db.commit(), enqueuescitation_attribution_recompute with priority 7 (non-blocking; exception swallowed).
api_routes/senate.py — Added three routes:GET /senate/deferred-work — HTML monitor page with queue stats, oldest-pendingGET /api/senate/deferred-work — JSON stats API.POST /api/senate/deferred-work/retry/{task_name} — Reset failed jobs for retry.
tests/test_deferred_work.py — 14 tests: unit (backoff, inline, register) +Latency evidence (qualitative):
The three migrated callsites previously added 50–2000 ms to their request paths
(MermaidGate Node.js startup, mat-view refresh, citation extraction). They now
return immediately; the work happens in the background worker. The actual p95
measurement will be available after the worker daemon runs in production for 24h.
{
"_stall_skip_providers": [
"glm"
]
}