Skip to content

🤖 feat: add best-of-n support for sub-agents#2916

Merged
ammario merged 31 commits intomainfrom
feat/best-of-n-subagents
Mar 14, 2026
Merged

🤖 feat: add best-of-n support for sub-agents#2916
ammario merged 31 commits intomainfrom
feat/best-of-n-subagents

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Mar 12, 2026

Summary

Add best-of-n support to sub-agent spawning, coalesce grouped runs in the transcript and sidebar, clarify that parent agents should do only brief setup before delegating user-requested best-of batches, and harden the grouped recovery paths so interrupted, restarted, or historical parent task calls still render and finalize cleanly.

Background

The task tool previously spawned a single sub-agent per call, which made best-of exploration awkward and noisy in the UI. Adding first-class batching helped, but the follow-up review cycle exposed several edge cases around grouped task completion, deferred fallback delivery, startup recovery, transcript rebinding after old child workspaces had already been cleaned up, concurrent child stream-end handlers racing to deliver deferred fallback reports, stale older best-of groups being able to satisfy a newer pending parent partial after restart, and the readability cost of repetitive best-of task-service regression scaffolding.

Implementation

  • add an optional n parameter to the task tool, defaulting to 1 and validating the allowed 1–20 range
  • spawn n sibling child tasks when requested, persist shared best-of metadata on those workspaces, and return grouped task metadata/reports from the tool
  • clarify the task tool guidance so user-requested best-of runs keep the parent focused on brief setup and synthesis instead of duplicating the full child analysis in parallel
  • update live task tracking and task report linking so grouped task cards render as a single expandable “best of N” transcript entry
  • coalesce leaf best-of sub-agents into one expandable sidebar row while still allowing users to reveal individual candidates
  • preserve partial best-of results when foreground waits are backgrounded, time out, or are interrupted instead of dropping already-completed sibling reports
  • finalize ready parent best-of task partials from persisted child report artifacts before deferred cleanup, both during parent stream-end recovery and during startup recovery after restart
  • gate pending best-of recovery by partial start time so stale older groups cannot finalize a newer pending parent task call, even when only one matching group remains in config
  • avoid rebinding historical best-of transcript cards to a different later matching group when stale task IDs are already known
  • serialize deferred best-of fallback/finalization work per parent so concurrent child stream-end handlers cannot append duplicate synthetic subagent reports
  • factor repetitive best-of taskService.test.ts setup/report helpers so the regression coverage stays behavior-oriented without re-embedding the same child stream-end scaffolding in every case
  • simplify several low-risk best-of UI/backend helpers and trim tautological constant-only assertions from task.test.ts

Validation

  • make static-check
  • bun x jest tests/ui/tasks/bestOfProgress.test.ts --runInBand
  • bun x jest tests/ui/tasks/awaitVisualization.test.ts
  • bun test src/browser/components/ProjectSidebar/ProjectSidebar.test.tsx --test-name-pattern 'best-of|Best-of|leaf'
  • bun test src/node/services/tools/task.test.ts
  • bun test src/node/services/taskService.test.ts --test-name-pattern 'agent_report waits for all best-of reports|partial best-of spawn failure|duplicate synthetic parent reports|best-of'
  • bun test src/node/services/taskService.test.ts --test-name-pattern 'stale single best-of group|targets the pending best-of group|finalizes ready best-of partials before cleanup rechecks|initialize finalizes ready best-of partials before cleanup rechecks'
  • bun test src/node/services/taskService.test.ts --test-name-pattern 'best-of|Best-of|cleanup rechecks|concurrent deferred best-of fallback delivery does not duplicate synthetic reports|finalizes ready best-of partials before cleanup rechecks|initialize finalizes ready best-of partials before cleanup rechecks'

Risks

This touches task tool result shapes, model-facing delegation guidance, restart-safe partial recovery, deferred fallback delivery, startup cleanup ordering, parent-scoped recovery locking, sidebar/chat rendering for child tasks, and the regression harnesses that cover those flows. The highest regression risk is around grouped task completion diverging from single-task behavior or stale persisted groups being rebound to the wrong pending parent call, so the change is covered with targeted schema, tool, task-service, transcript, and sidebar tests in addition to make static-check.


Generated with mux • Model: openai:gpt-5.4 • Thinking: xhigh • Cost: $260.21

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70f03ce494

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Please take another look.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 271e0d7fc0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Please take another look.

1 similar comment
@ammar-agent
Copy link
Collaborator Author

@codex review

Please take another look.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd60327c27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Please take another look.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f236d7b3dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67b5557f5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Please take another look.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3624e33604

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

1 similar comment
@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 492bf42ee9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb323fb1a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fc181773f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be6b8b3fb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: baebaef3e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Handle partial-spawn best-of counts in the parent task UI and avoid suppressing fallback best-of reports when interrupted grouped recovery can no longer finalize.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Only defer grouped best-of fallback while a single pending parent task call is still recoverable, so malformed interrupted partial state still falls back to synthetic parent reports.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Require better best-of recovery discrimination in the parent task UI, keep recovered groups stable once matched, and remove the unnecessary manual memoization wrapper from sidebar expansion state.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Keep grouped best-of task runs observable in terminal workflows by summarizing grouped running/completed task outputs in the CLI formatter.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Only bind executing best-of task cards to child groups after concrete task IDs arrive, and update the UI tests to drive the same task-created event path used in production.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Use the parent tool message timestamp as the best-of recovery discriminator when no task-created IDs are available, and seed the UI tests with realistic tool timestamps.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Preserve grouped taskIds/tasks when a best-of spawn stops after a single candidate so downstream UIs still retain the 1-of-N batching context.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
When a grouped best-of recovery later becomes impossible because a sibling interrupts without reporting, proactively deliver deferred sibling reports back to the parent conversation.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Keep interrupted best-of task headers honest and stop deferring grouped fallback/cleanup when unrelated pending task calls make grouped partial finalization impossible.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Skip best-of siblings whose synthetic fallback reports were already appended when replaying deferred reports after later sibling interruptions.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Represent interrupted best-of siblings in partial task results and retry deferred fallback delivery when parent streams end or sibling interruptions make grouped finalization impossible.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$104.87`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=104.87 -->
Finalize ready parent best-of task partials before deferred cleanup, add a regression test for that restart-safe recovery path, and simplify several low-risk best-of UI/backend helpers.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
Finalize pending best-of parent tool calls during startup recovery, avoid rebinding historical best-of cards to later matching groups, and cover both regressions with focused tests.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
Serialize parent-scoped deferred best-of fallback/finalization work so concurrent child stream-end handlers cannot append duplicate synthetic subagent reports, and add a regression test covering that race.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
Use the parent-scoped deferred best-of lock for direct reported-child delivery as well as deferred fallback delivery so concurrent reported/interrupted sibling completion cannot append duplicate synthetic reports.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
Move sibling cleanup rechecks out of the parent-scoped best-of delivery lock so concurrent child stream-end handlers cannot deadlock on parent and child cleanup locks.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
Only run deferred best-of parent recovery when a pending best-of task partial actually exists, and add coverage so completed grouped task results do not append duplicate synthetic reports on later parent stream-end rechecks.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
Resolve the pending parent best-of group before deferred recovery runs so older stale groups under the same parent cannot finalize the current pending task tool call or emit duplicate fallback reports.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$231.83`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=231.83 -->
@ammar-agent ammar-agent force-pushed the feat/best-of-n-subagents branch from 2275f14 to 7381b95 Compare March 14, 2026 14:11
@ammar-agent
Copy link
Collaborator Author

@codex review

Rebased this branch onto main, reran make static-check, and force-pushed the updated branch tip.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7381b9538a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Addressed the stale single-group recovery issue, clarified best-of delegation guidance, and removed tautological constant-only assertions from task.test.ts. Validation rerun: bun test src/node/services/tools/task.test.ts, targeted taskService best-of recovery tests, and make static-check.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammar-agent
Copy link
Collaborator Author

@codex review

Factored the repetitive best-of scaffolding in taskService.test.ts, reran targeted best-of tests plus make static-check, and pushed the cleanup commit.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ammario ammario merged commit 81bc7f7 into main Mar 14, 2026
23 checks passed
@ammario ammario deleted the feat/best-of-n-subagents branch March 14, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants