[Senate] Runaway-agent circuit breaker - rate-limit artifact creation done

← Resource Governance
Blocked: rebase complexity after two prior review rejections. Branch adds write_circuit_breaker module + tests; main has advanced significantly creating large conflicts in api.py and api_routes/senate.py. Higher-trust agent needed for conflict resolution.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

[Senate] Runaway-agent circuit breaker — rate-limit artifact creation [task:bcd6601f-5fdb-4447-ad16-6ddaec610e2d] (#870)2026-04-27
Squash merge: orchestra/task/dff08e77-holistic-task-prioritization-and-self-go (2 commits) (#774)2026-04-27
Spec File

Goal

The 2026-04-24 recurring-block incident
(memory/project_orchestra_recurring_block_incident.md) and the
mermaid fence-stripping incident
(memory/project_scidex_mermaid_fence_incident.md) both share a shape:
a single agent / skill emitted bulk writes (165 task updates; ~5,800
wiki edits) before any human noticed. SciDEX has no per-actor rate
limit on artifact creation. This task adds a sliding-window circuit
breaker keyed on (actor_id, artifact_type) that trips when an actor
exceeds a configured rate and pauses further writes pending Senate
review. It uses a simple token bucket so legitimate batch jobs can be
allowlisted, but unattended runaway agents are stopped within seconds.

Effort: deep

Acceptance Criteria

☐ New module scidex/senate/write_circuit_breaker.py:
- check(actor_id, artifact_type, *, conn) -> Decision
where Decision = ('allow' | 'throttle' | 'trip', reason: str,
retry_after_s: int | None)
.
- Token-bucket state in new table:

CREATE TABLE actor_write_bucket (
          actor_id        TEXT NOT NULL,
          artifact_type   TEXT NOT NULL,
          tokens          DOUBLE PRECISION NOT NULL,
          last_refill     TIMESTAMPTZ NOT NULL,
          tripped_at      TIMESTAMPTZ,
          tripped_reason  TEXT,
          PRIMARY KEY (actor_id, artifact_type)
        );

- Default policy: bucket capacity 60, refill 1 token/sec
(= 60/min sustained, 60 burst). Per-pair overrides in
scidex/senate/write_breaker_policy.yaml:
- *::wiki_page → cap 30, refill 0.1/s (== 6/min) — wiki
edits are slow, expensive, hard to revert.
- *::artifact_comment → cap 120, refill 2/s.
- *::artifact_link → cap 100, refill 1/s.
- senate.:: → cap 600, refill 10/s (Senate engines
allowlisted for high-throughput sweeps).

☐ Migration migrations/20260428_write_circuit_breaker.sql adds
the table above plus

CREATE TABLE actor_write_trip_event (
        id BIGSERIAL PRIMARY KEY,
        actor_id TEXT NOT NULL,
        artifact_type TEXT NOT NULL,
        tripped_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
        recent_writes INT NOT NULL,
        window_seconds INT NOT NULL,
        cleared_at TIMESTAMPTZ,
        cleared_by TEXT
      );

☐ Wire-up — call check() from at minimum these write sites:
- scidex/atlas/wiki_writer.py upsert path.
- POST /api/comments in api.py.
- scidex/atlas/artifact_commit.commit_artifact.
- scidex/agora/crosslink_emitter.emit_link.
- scidex/atlas/artifact_registry.register_artifact.
Each call site catches Decision.trip and raises
CircuitTripped(reason, retry_after_s); the API translates to
HTTP 429. CLI batch writers print a clear "actor X tripped on
type Y at TS; clear with orchestra senate clear-trip ...".
Trip handler — when a bucket trips, the module emits a
senate_alerts row of kind actor_write_trip and creates an
Orchestra task in the Senate queue (actor_id, artifact_type,
recent_writes, cleared_at IS NULL) with priority 95 so a
human reviewer can examine the surge before clearing.
Clear pathPOST /api/senate/circuit_breaker/clear
{actor_id, artifact_type} zeroes tripped_at and refills the
bucket. Records cleared_by from the auth context.
☐ CLI: orchestra senate breakers list shows the live bucket
table; orchestra senate breakers clear actor_id type calls the
API.
☐ Senate dashboard tile "Tripped breakers (live)" + "Trip events
(24h)" in dashboard_engine.py.
☐ Tests tests/test_write_circuit_breaker.py:
bucket arithmetic; allowlisted actor not tripped at 200/min;
default actor tripped at 200/min; trip → clear → resume; per-type
policy overrides; concurrent check (SELECT … FOR UPDATE).

Approach

  • Build the bucket math + policy loader as a pure unit-tested module
  • first, against an in-memory dict.
  • Add the migration, switch to a row-locking SQL implementation
  • (SELECT … FOR UPDATE), and benchmark p99 < 5 ms per check.
  • Wire the five write sites; for each, wrap the existing write in a
  • with breaker_check(...) context manager so a refactor never
    silently drops the check.
  • Trip-handler emits the alert + creates the Senate task; verify by
  • simulating 1000 wiki writes from a fake actor in < 60 s.
  • Add the dashboard tile and clear-flow; ship.
  • Dependencies

    • q-safety-emergency-pause — shares the senate_alerts table.

    Dependents

    • q-safety-suspicious-pattern-detector — feeds N-write surges into
    the breaker as one of its detection signals.

    Work Log

    2026-04-27 — Implementation complete

    Prior attempts (2660c5ea2, 87e5a7ff4) had working code but were never merged
    due to rebase conflicts in api_routes/senate.py (which had been corrupted to
    a single-line placeholder by squash merge 9c19eed9d).

    This run restored api_routes/senate.py from the last known-good commit
    (0c3043394) and applied all circuit breaker changes cleanly on current main:

    • scidex/senate/write_circuit_breaker.py — token-bucket core + trip handler
    • scidex/senate/write_breaker_policy.yaml — per-type rate overrides
    • migrations/20260428_write_circuit_breaker.sql — two tables + indexes
    • api_routes/senate.py — restored + /api/senate/circuit_breaker/clear and
    /api/senate/circuit_breaker/trips endpoints
    • api.py — circuit breaker on POST /api/comments and artifact comments
    • scidex/atlas/artifact_commit.py — check before git lock
    • scidex/atlas/artifact_registry.py — check on register_artifact
    • scidex/core/db_writes.py — check on save_wiki_page (+ _actor_id param)
    • scidex/senate/crosslink_emitter.py — check on emit_links
    • scidex/senate/dashboard_engine.py — circuit_breaker_trips + circuit_breaker_events evaluators
    • tests/test_write_circuit_breaker.py — 21 tests, all passing

    Sibling Tasks in Quest (Resource Governance) ↗

    Task Dependencies

    ↓ Referenced by (downstream)