Migrate Orchestra task-fleet merges to PR-based

← All Specs

Migrate Orchestra task-fleet merges to PR-based

Why

Today every task worker that completes its work calls orchestra sync push, which:

  • Pushes the branch to origin/<branch>.
  • Hard-resets local main to origin/main.
  • git merge --squash <branch> into local main.
  • Runs the post-merge regression guard.
  • git push --force-with-lease=main:<sha> origin main.
  • This works but bypasses GitHub's PR infrastructure entirely:

    • No PR review surface for security-sensitive or controversial changes.
    • No GitHub status checks (CI, smoke check, post_merge_guard regex).
    • No record of which task each merge corresponds to outside of the squash commit message.
    • No GitHub-side mergeability check ("This branch is N commits behind main") — stale-base merges only get caught by the local post-merge guard, which is necessarily after the squash has already happened.
    • "Linear history" rule is enforced via --squash, but that means individual commits on the branch are lost. For multi-step task work that's a real loss when investigating a regression.
    • Direct pushes to main can't be required-PR by GitHub branch protection — every project that uses Orchestra would have to bypass the rule, defeating its purpose for human contributors too.

    The 2026-04-24 stale-base regressions (the orch.execute %s fix being silently reverted twice) made the cost of this concrete. Migrating to PRs would have caught at least one of them at the GitHub-side mergeability check.

    What we change

    orchestra/sync.py:_push_main_local() (and the push_main_via_gh_pr variant we'll add) moves from direct push to:

  • Push branch to origin/<branch> (unchanged).
  • gh pr create --base main --head <branch> --title <generated> --body <generated>.
  • gh pr merge <PR#> --squash --delete-branch --auto.
  • The --auto flag means the PR merges as soon as required status checks pass (which we add in a sibling spec).
  • Optionally: poll gh pr view <PR#> --json state to wait for merge before returning, OR fire-and-forget if the caller doesn't care about ordering.
  • The post-merge regression guard moves from sync.py-local to a GitHub Actions workflow (sibling spec: regression_guard_action_spec.md) that runs against the PR's merge commit. This way the same guard fires for direct gh pr merge from a human and for orchestra sync push automation.

    orchestra sync push becomes a thin wrapper around gh pr create + gh pr merge --auto. The flock + retry + smart-merge logic in sync.py is preserved for the branch-push step; everything after that step delegates to GitHub.

    Migration phases

    Phase A — additive: add gh-pr path alongside direct push

    • New env var ORCHESTRA_MERGE_VIA_PR=1 enables the new path.
    • _push_main_local() checks the env var; default keeps current behavior.
    • Workers that opt in can verify the new path with ORCHESTRA_MERGE_VIA_PR=1 orchestra sync push --project SciDEX --branch test.
    • Run for ~7 days on a small subset of slots (e.g., slot >= 50 only) to get baseline metrics: success rate, mean time-to-merge, false-positive rate of the guard.

    Phase B — flip default

    • Once Phase A shows ≥99% success rate and no novel failure modes, flip the default to ORCHESTRA_MERGE_VIA_PR=1.
    • Keep the legacy direct-push path behind ORCHESTRA_MERGE_VIA_PR=0 for emergency rollback.
    • Update SciDEX AGENTS.md PR section to remove the "transitioning" caveat.

    Phase C — enforce on GitHub side

    • Update SciDEX-AI/SciDEX ruleset 14947533 to add the pull_request rule (require PRs for all changes to main).
    • Add required_status_checks with strict: true (must be up to date) once the smoke check status check is wired (sibling spec).
    • Remove the legacy direct-push code path from sync.py.

    Edge cases

    • Authentication. gh uses the env's GITHUB_TOKEN or gh auth status token. The SciDEX-AI-User-01 PAT (or app token) is what currently does direct pushes; same credentials will create + merge PRs. Verify gh auth status works inside the worker sandbox before flipping the default.
    • Bwrap sandbox. Workers running inside bwrap sandboxes don't have network access to GitHub. The deploy step runs OUTSIDE the sandbox (Orchestra invokes _deploy_to_main from the supervisor process), so this is fine — but worth verifying with a smoke test in Phase A.
    • PR throttling. GitHub allows ~100 concurrent merge-queue items per repo. At current task throughput (~100 tasks/day) we won't hit this. If it becomes an issue, batch related tasks into one PR via the same branch.
    • Deletion of the local branch. gh pr merge --delete-branch deletes the remote branch. Local cleanup happens via orchestra hygiene already.
    • Partial merge failures. If gh pr create succeeds but gh pr merge fails (e.g., status check pending), the PR sits open. Add a sweeper (orchestra hygiene extension) that polls open PRs older than N hours and either auto-merges them if checks pass, or notifies on Slack/dashboard if blocked.
    • Force-push / amend semantics. Today --force-with-lease=main:<sha> ensures we don't overwrite concurrent pushes. With gh pr merge, GitHub's internal merge logic enforces concurrency safely; no force-with-lease needed.

    Sibling specs

    • regression_guard_action_spec.md — GitHub Action that runs .orchestra/post_merge_guard.txt patterns + scripts/smoke_check.py forbidden-keys against PR merge commits. Becomes a required status check in Phase C.
    • pull_main_sh_retire_spec.md (already partially shipped) — once main updates only via merged PRs, the 30-second git reset --hard FETCH_HEAD cycle on /home/ubuntu/scidex/ becomes simpler (just git pull --ff-only) since there's no concurrent direct-push race.

    Acceptance

    • Phase A: 100 task completions land via gh pr create + gh pr merge; 0 unhandled failures; P95 time-to-merge ≤ 30 s.
    • Phase B: 1000 task completions land via the new path; failure rate < 1%.
    • Phase C: GitHub ruleset enforces pull_request and the strict status check; no direct-push paths remain in sync.py.

    Out of scope

    • Replacing the post-merge regex guard with a GitHub Action (separate spec — but those PRs need to land before Phase C can ship).
    • Per-PR review approval requirements (orthogonal — we may add --required_approving_review_count: 0 initially, then bump).
    • Migrating other Orchestra projects (NeuroWiki, Trader, etc.) — they follow once SciDEX validates the pattern.

    File: orchestra_pr_based_merge_spec.md
    Modified: 2026-04-28 03:24
    Size: 6.5 KB