Fix distill docs typo by AAnoosheh · Pull Request #3118 · NVIDIA-NeMo/Megatron-Bridge

AAnoosheh · 2026-04-02T19:45:03Z

What does this PR do ?

Distillation accidentally mentioned --nproc-per-node=1 instead of 2

Changelog

Change --nproc-per-node=1 to --nproc-per-node=2

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Documentation
- Updated all example commands in the Llama distillation recipe documentation to ensure accuracy and correctness. The distributed training setup examples now properly reflect current configuration parameters. This helps users following the documentation to successfully configure their training environments with the correct settings needed for reliable and reproducible distillation workflow results.

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

copy-pr-bot · 2026-04-02T19:45:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-02T19:49:36Z

📝 Walkthrough

Walkthrough

The documentation file for the Llama distillation recipe is updated to change the --nproc_per_node parameter from 1 to 2 in the example command invocations, reflecting an update to the recommended number of processes per node.

Changes

Cohort / File(s)	Summary
Documentation Update `docs/training/distillation.md`	Updated example commands for running Llama distillation recipe by changing `--nproc_per_node` value from `1` to `2` in both the basic invocation and the `--config-file` variant.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix distill docs typo' accurately reflects the main change: correcting a documentation error where --nproc-per-node was set to 1 instead of 2 in the distillation recipe example.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR contains only minor documentation changes (2 lines modified in markdown file) with no code modifications, new features, or breaking changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aanoosheh/distill-docs-typo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/training/distillation.md (1)
60-62: Consider adding a note about matching nproc_per_node to parallelism settings.

Since --nproc_per_node must match the total parallelism (TP × PP × CP), it would be helpful to add a brief explanation in the documentation. For example:
> **Note**: Set `--nproc_per_node` to match the product of `tensor_model_parallel_size`, `pipeline_model_parallel_size`, and `context_parallel_size`. For example, with TP=2, PP=1, CP=1, use `--nproc_per_node=2`.
This would help users understand why different examples use different values and how to determine the correct value for their custom configurations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/training/distillation.md` around lines 60 - 62, Add a brief explanatory
note in docs/training/distillation.md near the torchrun example clarifying that
the torchrun flag --nproc_per_node must equal the total parallelism
(tensor_model_parallel_size × pipeline_model_parallel_size ×
context_parallel_size), and give a short example (e.g., TP=2, PP=1, CP=1 →
--nproc_per_node=2) so users know how to choose the correct value for custom
configs; mention the three settings by name (tensor_model_parallel_size,
pipeline_model_parallel_size, context_parallel_size) and reference TP/PP/CP
shorthand for clarity.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/training/distillation.md`:
- Around line 52-53: The docs examples use --nproc_per_node=2 but the
distillation script distill_llama32_3b-1b.py documents and expects
--nproc_per_node=8; update every example invocation in
docs/training/distillation.md (all occurrences of --nproc_per_node=2 shown in
the examples) to --nproc_per_node=8 so the documentation matches the script's
specification and avoids GPU allocation errors.

---

Nitpick comments:
In `@docs/training/distillation.md`:
- Around line 60-62: Add a brief explanatory note in
docs/training/distillation.md near the torchrun example clarifying that the
torchrun flag --nproc_per_node must equal the total parallelism
(tensor_model_parallel_size × pipeline_model_parallel_size ×
context_parallel_size), and give a short example (e.g., TP=2, PP=1, CP=1 →
--nproc_per_node=2) so users know how to choose the correct value for custom
configs; mention the three settings by name (tensor_model_parallel_size,
pipeline_model_parallel_size, context_parallel_size) and reference TP/PP/CP
shorthand for clarity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: af4a387b-6e24-4929-9324-7d18dabdaac5

📥 Commits

Reviewing files that changed from the base of the PR and between 91c39a0 and 1305ae1.

📒 Files selected for processing (1)

docs/training/distillation.md

docs/training/distillation.md

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

yaoyu-33 · 2026-04-02T20:34:13Z

docs/training/distillation.md


 ```bash
-torchrun --nproc_per_node=1 examples/distillation/llama/distill_llama32_3b-1b.py
+torchrun --nproc_per_node=2 examples/distillation/llama/distill_llama32_3b-1b.py


@AAnoosheh can we update all to uv run -m torch.distributed.run --nproc_per_node=2

Fix nproc=1 typo

1305ae1

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

AAnoosheh requested a review from yaoyu-33 April 2, 2026 19:45

AAnoosheh self-assigned this Apr 2, 2026

AAnoosheh added cherry-pick r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. labels Apr 2, 2026

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

docs/training/distillation.md Show resolved Hide resolved

Additional typos in script

a0a2f80

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

yaoyu-33 reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix distill docs typo#3118

Fix distill docs typo#3118
AAnoosheh wants to merge 2 commits intomainfrom
aanoosheh/distill-docs-typo

AAnoosheh commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 2, 2026

Uh oh!

coderabbitai bot commented Apr 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

yaoyu-33 Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AAnoosheh commented Apr 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 2, 2026

Uh oh!

coderabbitai bot commented Apr 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yaoyu-33 Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AAnoosheh commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

yaoyu-33 Apr 2, 2026 •

edited

Loading