Preserve batched env evaluation in async validation rollouts by taivu1998 · Pull Request #2209 · NVIDIA-NeMo/RL

taivu1998 · 2026-04-03T17:55:50Z

Summary

Addresses #1798 by preserving batched environment evaluation during async validation.

Today, when async vLLM generation is enabled, GRPO and distillation validation route through run_async_multi_turn_rollout(). That helper is optimized for sample-level pipelining, which is a good fit for training and async trajectory collection, but it evaluates environments from per-sample loops instead of from the batched rollout loop used by synchronous validation. In validation, that means we lose task-level batching in reward/env evaluation and pay unnecessary latency.

This PR keeps the training path unchanged and introduces a validation-specific rollout helper that combines async generation with the existing batched multi-turn environment loop.

What Changed

added run_multi_turn_rollout_async_generation() to nemo_rl/experience/rollouts.py
- mirrors run_multi_turn_rollout()'s batched multi-turn control flow
- uses generate_responses_async() for async vLLM generation
- preserves batched task grouping via calculate_rewards()
- keeps stop string updates, env metadata propagation, truncation handling, reward accumulation, and rollout metrics aligned with the synchronous helper
updated GRPO validation to use the new helper when _should_use_async_rollouts(master_config) is true
updated distillation validation to use the new helper in the same async-validation case
left run_async_multi_turn_rollout() unchanged for training and async trajectory collection, where sample-level pipelining is still the intended behavior
added targeted unit coverage for:
- the new rollout helper's batching behavior for same-task batches
- task grouping behavior for mixed-task batches
- GRPO validation selecting the new helper in async mode
- distillation validation selecting the new helper in async mode

Root Cause

The regression comes from using the same async rollout helper for both:

validation, where preserving batched reward/environment calls is important, and
training/async collection, where per-sample pipelining is desirable

run_async_multi_turn_rollout() processes each sample independently across turns. That architecture improves overlap for some training scenarios, but it also changes where environment evaluation happens. Validation ended up on the pipelined path and lost the batching characteristics of run_multi_turn_rollout().

Why This Design

This change fixes the validation bottleneck without broadening the blast radius:

no scheduler rewrite
no changes to the async training path
no change to NeMo-Gym validation handling
minimal call-site changes in the two validation entry points that were affected

The new helper is intentionally narrow and reuses the proven synchronous rollout structure, which keeps the fix easier to reason about and reduces regression risk.

User / Developer Impact

async validation with batched environments now preserves task-level batching again
training and async rollout collection behavior remain unchanged
the code path is clearer: validation uses a batched helper, while sample-pipelined async rollout remains dedicated to the training-style path it was designed for

Validation

python3.12 -m py_compile on all changed source and test files
targeted unit tests added for rollout batching and validation branch selection
direct smoke test of the new helper against a stubbed environment/generation setup to confirm it matches run_multi_turn_rollout() semantics while preserving batched env grouping

Notes

I could not run repo-native uv run pytest end-to-end in this environment because the project dependency resolution path pulls cuda-bindings==13.0.1, which is not available for the current macOS arm64 platform. The verification above was chosen to maximize signal despite that platform constraint.

copy-pr-bot · 2026-04-03T17:55:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Use batched async generation for validation

c6a661e

github-actions bot added the community-request label Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve batched env evaluation in async validation rollouts#2209

Preserve batched env evaluation in async validation rollouts#2209
taivu1998 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-1798-async-validation

taivu1998 commented Apr 3, 2026

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taivu1998 commented Apr 3, 2026

Summary

What Changed

Root Cause

Why This Design

User / Developer Impact

Validation

Notes

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants