fix: use prompt token length for advantage group extraction by yfw · Pull Request #2176 · NVIDIA-NeMo/RL

yfw · 2026-03-30T23:17:17Z

The previous role-based extraction (_extract_prompt_only_messages) broke on multi-turn prompts containing assistant messages in the conversation history — it would strip them, corrupting the prompt IDs used for advantage estimation.

Replace with extract_initial_prompt_messages() which uses the length field to identify the original prompt boundary. Applied to both sync and async GRPO paths.

Closes #1960

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

The previous role-based extraction (`_extract_prompt_only_messages`) broke on multi-turn prompts containing assistant messages in the conversation history — it would strip them, corrupting the prompt IDs used for advantage estimation. Replace with `extract_initial_prompt_messages()` which uses the `length` field to identify the original prompt boundary. Applied to both sync and async GRPO paths. Closes #1960 Co-Authored-By: Jiaqi Zeng <[email protected]> Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Yi-Fu Wu <[email protected]>

copy-pr-bot · 2026-03-30T23:17:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuki-97 · 2026-03-31T08:39:27Z

/ok to test 628a248

yfw requested a review from a team as a code owner March 30, 2026 23:17

yfw added the super-v3 label Mar 30, 2026

yfw requested a review from a team as a code owner March 30, 2026 23:17

yuki-97 approved these changes Mar 31, 2026

View reviewed changes

yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Mar 31, 2026

copy-pr-bot bot temporarily deployed to nemo-ci March 31, 2026 08:39 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use prompt token length for advantage group extraction#2176

fix: use prompt token length for advantage group extraction#2176
yfw wants to merge 1 commit intomainfrom
yifu/fix-prompt-extraction-multi-turn

yfw commented Mar 30, 2026

Uh oh!

copy-pr-bot bot commented Mar 30, 2026

Uh oh!

yuki-97 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yfw commented Mar 30, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Mar 30, 2026

Uh oh!

yuki-97 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants