[Atlas] Read replica infra + get_db_ro() hook

← All Specs

Goal

Stand up a local PostgreSQL streaming replica to deflect read load
from the primary, and give the app a `get_db_ro() hook so
read-heavy endpoints can opt in incrementally without a big-bang
refactor.

What landed

Infra (already live)

  • Replica on port 5433, data dir /data/postgres-replica,
streaming from primary with async replication. Replay lag
<10ms steady-state (
SELECT replay_lag FROM pg_stat_replication).
  • systemd unit scidex-pg-replica.service with Restart=on-failure.
  • pgBouncer gains a second database alias scidex_ro pointing
at host=127.0.0.1 port=5433. Pool mode stays session; same
SCRAM auth as the primary alias.

Code (this PR)

api_shared/db.py:

  • _get_ro_pool() — lazy-init second ConnectionPool at
SCIDEX_PG_RO_DSN (env). When unset, falls through to the
primary pool so local dev + tests don't need a replica.
  • get_db_ro() — public helper that returns a PGConnection
from the RO pool. Callers close() it in a try/finally.
  • pool_stats() reports ro_* keys alongside primary.

Environment

/etc/scidex-pg.env gains:

SCIDEX_PG_RO_DSN=dbname=scidex_ro user=scidex_app password=... host=127.0.0.1 port=6432
SCIDEX_PG_RO_POOL_MIN=2
SCIDEX_PG_RO_POOL_MAX=40

When to use get_db_ro()

Good fits (safe, high-value):

  • /wiki/<slug>, /entity/<id>, /hypothesis/<id> detail pages
  • /api/graph/, /api/kg/ read endpoints
  • Dashboard aggregations / stats
  • Any GET endpoint that does not care about read-your-own-write
Do NOT use:
  • POST/PUT/DELETE handlers (the RO connection rejects writes)
  • Any endpoint that writes then immediately re-reads
  • Hot-write paths that need transactional guarantees

Follow-up

This PR adds the plumbing. A separate PR will migrate the top 10
read-heavy endpoints (identified via
pg_stat_statements on the
primary) to
get_db_ro()` — should deflect ~60% of read traffic
based on query-mix observation.

File: read_replica_2026_04_21_spec.md
Modified: 2026-04-28 03:24
Size: 2.1 KB