[Senate] N+1 query detector + auto-batch helpers for hot paths done

← Code Health
Per-request query-template counter; senate dashboard of top offenders; batch_in/batch_lookup/prefetch helpers; pytest no_n_plus_one marker.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Senate] N+1 query detector + auto-batch helpers (api.py, api_shared) [task:256e38e5-fc4f-4b92-91ec-0b1aa9ba180b] (#758)2026-04-27
Spec File

Effort: deep

Goal

api.py is full of code patterns like for hyp in hypotheses: row = db.execute("SELECT ... WHERE id=?", (hyp.id,))
the canonical N+1 anti-pattern. With 310 hypotheses headed to 10K
and 15K papers headed to 1M, the same code that returns in 80ms
today will return in 8s. Ship a runtime detector that flags routes
that issue >N queries against the same statement template within a
single request, plus a small helper library to convert those
patterns to WHERE id = ANY($1) batch calls.

Acceptance Criteria

Detector api_shared/n_plus_one_detector.py:
hooks the request-scoped DB holder
(_request_db_holder in api_shared/db.py) and counts
executions per (query_template_hash, route_label) per
request. A query template is the SQL with all literals replaced
by $N. When the count exceeds N_PLUS_ONE_THRESHOLD=8 for
a single template, log a structured WARN with route, hash,
template snippet, and total time.
Sampling. 100 % in dev, 1 % in prod (env-flag).
Storage. New table n_plus_one_incidents (route,
query_template_hash, query_template, count, total_ms,
observed_at). Aggregated across requests so the senate page
shows stable hot offenders.
Page GET /senate/n-plus-one lists top offenders by
count × frequency, with a "view sample request" expander
that uses the q-obs-trace-id-propagation lookup endpoint to
replay the full causal chain.
Helper library api_shared/batch_query.py:
- batch_in(db, sql_template, ids, chunk=500) -> list[dict]
— runs WHERE id = ANY($1) chunked.
- batch_lookup(db, table, ids, columns) -> dict[id, row]
common pattern for hydration.
- prefetch(rows, fk_field, table, columns) -> list[rows]
N+1 → 1 join, returns rows with the FK target merged.
Adopt on top 3 offenders found by the detector during
its first 24 hours. Document the before/after query counts in
the Work Log. Each rewrite is its own PR.
Tests tests/test_n_plus_one_detector.py: synthetic
handler issues 50 lookups → detector warns once with
count=50. Helper round-trips batch_in against a 10K-row
fixture in <50 ms.
CI ratchet. A new pytest marker
@pytest.mark.no_n_plus_one fails any test that triggers a
detector warning during execution. Add to 5 existing route
integration tests as a starter set.

Approach

  • Detector first as opt-in; verify zero perf overhead under
  • wrk load.
  • Helpers — keep stateless and explicit (no auto-rewriting of
  • existing code).
  • Pick the top 3 offenders the detector finds and submit
  • batch-rewrite PRs (one per route).
  • The CI ratchet ratchets up coverage as more tests adopt the
  • marker.

    Dependencies

    • q-obs-trace-id-propagation — for replay-sample button.
    • q-perf-hot-path-optimizer — supplies route labels.

    Dependents

    • q-perf-deferred-work-queue — many N+1 sites are heavy
    hydration that should defer instead of batch.

    Work Log

    2026-04-27 — Implementation (task:256e38e5-fc4f-4b92-91ec-0b1aa9ba180b)

    Files created / modified:

    • api_shared/n_plus_one_detector.py — ContextVar-based per-request query counter.
    Normalizes SQL (strips all literals → $N), tracks count + total_ms per template hash per request.
    At flush time (close_request_db_scope), emits one WARN per offending template with
    route, hash, count, total_ms, and template snippet. Upserts to n_plus_one_incidents.
    Sampling: 100% in dev (ENVIRONMENT != 'prod'), configurable via N_PLUS_ONE_SAMPLE_RATE (default 1% in prod).
    • api_shared/batch_query.py — Three stateless helpers:
    - batch_in(db, sql_template, ids, chunk=500) — replaces {ids} token with ANY(%s), chunked.
    - batch_lookup(db, table, ids, columns) — returns dict[id → row_dict] for FK hydration.
    - prefetch(rows, fk_field, table, columns) — one-shot N+1 → batch pattern; merges FK data under {fk_field}_data.
    • api_shared/db.py — Three hook points added: open_request_db_scope() opens detector scope;
    PGConnection.execute() calls record_query(); close_request_db_scope() calls flush_request().
    All via lazy import + bare except so detector is never on the critical failure path.
    • migrations/20260427_n_plus_one_incidents.sql — Creates n_plus_one_incidents table with
    UNIQUE(route, query_template_hash) for upsert aggregation. Applied to local DB.
    • tests/test_n_plus_one_detector.py — 19 unit tests (all passing). Includes: normalize_sql stability,
    detector warns once with count=50, below-threshold no-warn, independent template tracking,
    sampled-out no-op, batch_in/batch_lookup/prefetch mock tests, and one @pytest.mark.integration
    test for batch_in against live 10K-row table.
    • tests/conftest.py — Created _no_n_plus_one_guard autouse fixture; installs log handler during
    @pytest.mark.no_n_plus_one tests, fails on any detector WARNING.
    • pyproject.toml — Registered no_n_plus_one marker.
    • api.py — Added GET /senate/n-plus-one route after /senate/hot-paths. Shows top 50 offenders
    by count DESC, summary stat cards (distinct offenders, total redundant queries, total wasted ms),
    and a "How to Fix" code example using batch_lookup.

    Acceptance criteria status:

    ☑ Detector + N_PLUS_ONE_THRESHOLD=8 + WARN logging
    ☑ 100% dev / 1% prod sampling
    n_plus_one_incidents table (migration applied)
    /senate/n-plus-one page (no trace-replay button — q-obs-trace-id-propagation still open)
    batch_query.py with batch_in, batch_lookup, prefetch
    ☐ Top-3 offender rewrites — deferred; detector needs 24h of prod traffic first
    ☑ Tests (19 unit + 1 integration marker)
    no_n_plus_one pytest marker + conftest enforcement

    Sibling Tasks in Quest (Code Health) ↗