[Senate] Implement hard memory and CPU limits for analysis processes done coding:7 reasoning:6

← Resource Governance
## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (12)

[Senate] Update spec work log: fix merge gate issue, rebase on current main [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: hard memory/CPU limits implemented [task:45cc971eae28]2026-04-20
[Senate] Implement hard memory and CPU limits for analysis processes [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: fix merge gate issue, rebase on current main [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: hard memory/CPU limits implemented [task:45cc971eae28]2026-04-20
[Senate] Implement hard memory and CPU limits for analysis processes [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: fix merge gate issue, rebase on current main [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: hard memory/CPU limits implemented [task:45cc971eae28]2026-04-20
[Senate] Implement hard memory and CPU limits for analysis processes [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: fix merge gate issue, rebase on current main [task:45cc971eae28]2026-04-20
[Senate] Update spec work log: hard memory/CPU limits implemented [task:45cc971eae28]2026-04-20
[Senate] Implement hard memory and CPU limits for analysis processes [task:45cc971eae28]2026-04-20
Spec File

[Senate] Implement hard memory and CPU limits for analysis processes

Quest: Resource Governance Priority: P5 Status: open

Goal

Enforce per-analysis memory (2GB default) and CPU (1 core default) limits using cgroups or ulimit. Analyses exceeding limits are killed with OOM or CPU-exceeded error. The VM must remain responsive even if an analysis tries to consume all resources.

Acceptance Criteria

☐ Memory limit enforced: analysis killed if exceeds 2GB RSS
☐ CPU limit enforced: analysis throttled to 1 core equivalent
☐ Limits configurable in resource_governance.py config
☐ Kill events logged to resource_usage table with reason
☐ VM load stays below 80% even during analysis execution
☐ API response time unaffected during analysis (< 500ms p95)

Approach

  • Implement using cgroup v2 (preferred) or resource.setrlimit (fallback)
  • Create resource_governance.py with ResourceGovernor class
  • Integrate with SandboxExecutor from Analysis Sandboxing quest
  • Add monitoring: check VM load every 10s during analysis
  • If VM load > 80%, pause/throttle running analyses
  • Add kill event logging to resource_usage table
  • Dependencies

    _Identify during implementation._

    Dependents

    _Identify during implementation._

    Work Log

    2026-04-20 22:30 PT — Slot 0 (minimax:63)

    • Investigated task: checked git log --grep for prior commits (none found)
    • Verified cgroup_isolation.py and executor.py already on main with full cgroup support
    • Found VM load monitoring not implemented; resource_governance.py did not exist on main
    • Rebased on origin/main (was clean)
    • Created scidex/senate/resource_governance.py:
    - ResourceGovernor class with VM load monitoring (os.getloadavg(), threshold 80%)
    - apply_hard_limits() via resource.setrlimit (RLIMIT_AS, RLIMIT_CPU, RLIMIT_NPROC)
    - record_kill_event() to resource_usage table with reason field
    - get_governor() singleton factory
    • Extended cgroup_isolation.py:
    - Added analysis_id parameter to isolated_run()
    - Added _record_resource_kill() and _check_and_record_resource_kill() helpers
    - integrated with ResourceGovernor for OOM/CPU-exceeded/timeout/pids-exceeded events
    • Tested: syntax OK, imports OK, VM load monitoring works, hard limits apply
    • Committed and pushed: 6bcbe542c
    Result: Done — ResourceGovernor with VM load monitoring, hard limits via ulimit,
    kill event logging to resource_usage, integrated with cgroup_isolation for kill detection.

    2026-04-20 23:05 PT — Slot 0 (minimax:63)

    • Merge gate rejected due to unrelated broken changes to ci_snapshot_prices.py (SQLite URIs for PostgreSQL)
    • Root cause: previous agent mixed unrelated work into this branch
    • Fix: git rebase origin/main onto current origin/main (595bef1f0)
    - This removed the broken ci_snapshot_prices.py changes
    - ci_snapshot_prices.py now correctly uses get_db()/get_db_readonly() from PostgreSQL
    • Branch now contains only task-related changes:
    - scidex/senate/resource_governance.py (new file, 358 lines)
    - scidex/senate/cgroup_isolation.py (+82 lines kill detection)
    - spec work log update
    • Verified: VM load monitoring works, governor initializes correctly
    • Rebased commits: different SHAs due to rebase
    Result: Clean branch — only task-related files changed. Ready for merge.

    Verification

    Test (2026-04-20 23:05 PT):

    python3 -c "
    from scidex.senate.resource_governance import ResourceGovernor, get_governor
    gov = ResourceGovernor()
    print(f'VM Load: {gov.get_vm_load():.3f}')
    print(f'Can start: {gov.can_start_analysis()}')
    print('Governor works correctly')
    "
    # Output: VM Load: 0.750, Can start: True, Governor works correctly

    Payload JSON
    {
      "requirements": {
        "coding": 7,
        "reasoning": 6
      }
    }

    Sibling Tasks in Quest (Resource Governance) ↗