🤖 feat: add best-of-n support for sub-agents by ammar-agent · Pull Request #2916 · coder/mux

ammar-agent · 2026-03-12T14:43:37Z

Summary

Add best-of-n support to sub-agent spawning, coalesce grouped runs in the transcript and sidebar, clarify that parent agents should do only brief setup before delegating user-requested best-of batches, and harden the grouped recovery paths so interrupted, restarted, or historical parent task calls still render and finalize cleanly.

Background

The task tool previously spawned a single sub-agent per call, which made best-of exploration awkward and noisy in the UI. Adding first-class batching helped, but the follow-up review cycle exposed several edge cases around grouped task completion, deferred fallback delivery, startup recovery, transcript rebinding after old child workspaces had already been cleaned up, concurrent child stream-end handlers racing to deliver deferred fallback reports, stale older best-of groups being able to satisfy a newer pending parent partial after restart, and the readability cost of repetitive best-of task-service regression scaffolding.

Implementation

add an optional n parameter to the task tool, defaulting to 1 and validating the allowed 1–20 range
spawn n sibling child tasks when requested, persist shared best-of metadata on those workspaces, and return grouped task metadata/reports from the tool
clarify the task tool guidance so user-requested best-of runs keep the parent focused on brief setup and synthesis instead of duplicating the full child analysis in parallel
update live task tracking and task report linking so grouped task cards render as a single expandable “best of N” transcript entry
coalesce leaf best-of sub-agents into one expandable sidebar row while still allowing users to reveal individual candidates
preserve partial best-of results when foreground waits are backgrounded, time out, or are interrupted instead of dropping already-completed sibling reports
finalize ready parent best-of task partials from persisted child report artifacts before deferred cleanup, both during parent stream-end recovery and during startup recovery after restart
gate pending best-of recovery by partial start time so stale older groups cannot finalize a newer pending parent task call, even when only one matching group remains in config
avoid rebinding historical best-of transcript cards to a different later matching group when stale task IDs are already known
serialize deferred best-of fallback/finalization work per parent so concurrent child stream-end handlers cannot append duplicate synthetic subagent reports
factor repetitive best-of taskService.test.ts setup/report helpers so the regression coverage stays behavior-oriented without re-embedding the same child stream-end scaffolding in every case
simplify several low-risk best-of UI/backend helpers and trim tautological constant-only assertions from task.test.ts

Validation

make static-check
bun x jest tests/ui/tasks/bestOfProgress.test.ts --runInBand
bun x jest tests/ui/tasks/awaitVisualization.test.ts
bun test src/browser/components/ProjectSidebar/ProjectSidebar.test.tsx --test-name-pattern 'best-of|Best-of|leaf'
bun test src/node/services/tools/task.test.ts
bun test src/node/services/taskService.test.ts --test-name-pattern 'agent_report waits for all best-of reports|partial best-of spawn failure|duplicate synthetic parent reports|best-of'
bun test src/node/services/taskService.test.ts --test-name-pattern 'stale single best-of group|targets the pending best-of group|finalizes ready best-of partials before cleanup rechecks|initialize finalizes ready best-of partials before cleanup rechecks'
bun test src/node/services/taskService.test.ts --test-name-pattern 'best-of|Best-of|cleanup rechecks|concurrent deferred best-of fallback delivery does not duplicate synthetic reports|finalizes ready best-of partials before cleanup rechecks|initialize finalizes ready best-of partials before cleanup rechecks'

Risks

This touches task tool result shapes, model-facing delegation guidance, restart-safe partial recovery, deferred fallback delivery, startup cleanup ordering, parent-scoped recovery locking, sidebar/chat rendering for child tasks, and the regression harnesses that cover those flows. The highest regression risk is around grouped task completion diverging from single-task behavior or stale persisted groups being rebound to the wrong pending parent call, so the change is covered with targeted schema, tool, task-service, transcript, and sidebar tests in addition to make static-check.

Generated with mux • Model: openai:gpt-5.4 • Thinking: xhigh • Cost: $260.21

ammar-agent · 2026-03-12T14:43:59Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70f03ce494

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/tools/task.ts

src/browser/features/Tools/TaskToolCall.tsx

ammar-agent · 2026-03-12T14:59:07Z

@codex review

Please take another look.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 271e0d7fc0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/tools/task.ts

ammar-agent · 2026-03-12T15:16:32Z

@codex review

Please take another look.

ammar-agent · 2026-03-12T15:21:12Z

@codex review

Please take another look.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd60327c27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/taskService.ts

ammar-agent · 2026-03-12T15:34:28Z

@codex review

Please take another look.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f236d7b3dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/taskService.ts

ammar-agent · 2026-03-12T15:48:21Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67b5557f5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/tools/task.ts

src/node/services/taskService.ts

ammar-agent · 2026-03-12T16:15:04Z

@codex review

Please take another look.

chatgpt-codex-connector · 2026-03-12T16:34:03Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ammar-agent · 2026-03-12T18:35:41Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3624e33604

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/browser/features/Tools/TaskToolCall.tsx

src/browser/components/ProjectSidebar/ProjectSidebar.tsx

ammar-agent · 2026-03-12T19:02:48Z

@codex review

ammar-agent · 2026-03-12T19:12:50Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 492bf42ee9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/taskService.ts

ammar-agent · 2026-03-12T19:29:18Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb323fb1a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/browser/features/Tools/TaskToolCall.tsx

src/node/services/taskService.ts

ammar-agent · 2026-03-12T19:57:32Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fc181773f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/taskService.ts

ammar-agent · 2026-03-12T20:15:10Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be6b8b3fb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/browser/features/Tools/TaskToolCall.tsx

src/browser/components/ProjectSidebar/ProjectSidebar.tsx

ammar-agent · 2026-03-12T20:40:00Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: baebaef3e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/tools/task.ts

ammar-agent · 2026-03-12T20:57:39Z

@codex review

Handle partial-spawn best-of counts in the parent task UI and avoid suppressing fallback best-of reports when interrupted grouped recovery can no longer finalize. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Only defer grouped best-of fallback while a single pending parent task call is still recoverable, so malformed interrupted partial state still falls back to synthetic parent reports. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Require better best-of recovery discrimination in the parent task UI, keep recovered groups stable once matched, and remove the unnecessary manual memoization wrapper from sidebar expansion state. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Keep grouped best-of task runs observable in terminal workflows by summarizing grouped running/completed task outputs in the CLI formatter. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Only bind executing best-of task cards to child groups after concrete task IDs arrive, and update the UI tests to drive the same task-created event path used in production. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Use the parent tool message timestamp as the best-of recovery discriminator when no task-created IDs are available, and seed the UI tests with realistic tool timestamps. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Preserve grouped taskIds/tasks when a best-of spawn stops after a single candidate so downstream UIs still retain the 1-of-N batching context. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

When a grouped best-of recovery later becomes impossible because a sibling interrupts without reporting, proactively deliver deferred sibling reports back to the parent conversation. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Keep interrupted best-of task headers honest and stop deferring grouped fallback/cleanup when unrelated pending task calls make grouped partial finalization impossible. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Skip best-of siblings whose synthetic fallback reports were already appended when replaying deferred reports after later sibling interruptions. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Represent interrupted best-of siblings in partial task results and retry deferred fallback delivery when parent streams end or sibling interruptions make grouped finalization impossible. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

Finalize ready parent best-of task partials before deferred cleanup, add a regression test for that restart-safe recovery path, and simplify several low-risk best-of UI/backend helpers. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

Finalize pending best-of parent tool calls during startup recovery, avoid rebinding historical best-of cards to later matching groups, and cover both regressions with focused tests. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

Serialize parent-scoped deferred best-of fallback/finalization work so concurrent child stream-end handlers cannot append duplicate synthetic subagent reports, and add a regression test covering that race. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

Use the parent-scoped deferred best-of lock for direct reported-child delivery as well as deferred fallback delivery so concurrent reported/interrupted sibling completion cannot append duplicate synthetic reports. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

Move sibling cleanup rechecks out of the parent-scoped best-of delivery lock so concurrent child stream-end handlers cannot deadlock on parent and child cleanup locks. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

Only run deferred best-of parent recovery when a pending best-of task partial actually exists, and add coverage so completed grouped task results do not append duplicate synthetic reports on later parent stream-end rechecks. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

Resolve the pending parent best-of group before deferred recovery runs so older stale groups under the same parent cannot finalize the current pending task tool call or emit duplicate fallback reports. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

ammar-agent · 2026-03-14T14:11:47Z

@codex review

Rebased this branch onto main, reran make static-check, and force-pushed the updated branch tip.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7381b9538a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/node/services/taskService.ts

ammar-agent · 2026-03-14T14:28:58Z

@codex review

Addressed the stale single-group recovery issue, clarified best-of delegation guidance, and removed tautological constant-only assertions from task.test.ts. Validation rerun: bun test src/node/services/tools/task.test.ts, targeted taskService best-of recovery tests, and make static-check.

chatgpt-codex-connector · 2026-03-14T14:37:42Z

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ammar-agent · 2026-03-14T14:59:25Z

@codex review

Factored the repetitive best-of scaffolding in taskService.test.ts, reran targeted best-of tests plus make static-check, and pushed the cleanup commit.

chatgpt-codex-connector · 2026-03-14T15:07:45Z

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mintlify bot deployed to staging - docs March 12, 2026 14:44 View deployment

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/tools/task.ts Show resolved Hide resolved

src/browser/features/Tools/TaskToolCall.tsx Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/tools/task.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/taskService.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/taskService.ts Outdated Show resolved Hide resolved

ammar-agent force-pushed the feat/best-of-n-subagents branch from f236d7b to 9b8a148 Compare March 12, 2026 15:43

mintlify bot deployed to staging - docs March 12, 2026 15:44 View deployment

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/tools/task.ts Outdated Show resolved Hide resolved

src/node/services/taskService.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/browser/features/Tools/TaskToolCall.tsx Show resolved Hide resolved

src/browser/components/ProjectSidebar/ProjectSidebar.tsx Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/taskService.ts Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/browser/features/Tools/TaskToolCall.tsx Show resolved Hide resolved

src/node/services/taskService.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/taskService.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/browser/features/Tools/TaskToolCall.tsx Outdated Show resolved Hide resolved

src/browser/components/ProjectSidebar/ProjectSidebar.tsx Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/node/services/tools/task.ts Show resolved Hide resolved

ammar-agent added 18 commits March 14, 2026 09:10

ammar-agent force-pushed the feat/best-of-n-subagents branch from 2275f14 to 7381b95 Compare March 14, 2026 14:11

mintlify bot deployed to staging - docs March 14, 2026 14:12 View deployment

chatgpt-codex-connector bot reviewed Mar 14, 2026

View reviewed changes

src/node/services/taskService.ts Outdated Show resolved Hide resolved

🤖 fix: clarify and harden best-of delegation

eaa19b0

🤖 tests: factor best-of task service helpers

4aaa9d8

ammario merged commit 81bc7f7 into main Mar 14, 2026
23 checks passed

ammario deleted the feat/best-of-n-subagents branch March 14, 2026 15:21

Conversation

ammar-agent commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 12, 2026

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented Mar 12, 2026

ammar-agent commented Mar 12, 2026 •

edited

Loading