Skip to content

Fix distill docs typo#3118

Open
AAnoosheh wants to merge 2 commits intomainfrom
aanoosheh/distill-docs-typo
Open

Fix distill docs typo#3118
AAnoosheh wants to merge 2 commits intomainfrom
aanoosheh/distill-docs-typo

Conversation

@AAnoosheh
Copy link
Copy Markdown
Contributor

@AAnoosheh AAnoosheh commented Apr 2, 2026

What does this PR do ?

Distillation accidentally mentioned --nproc-per-node=1 instead of 2

Changelog

  • Change --nproc-per-node=1 to --nproc-per-node=2

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Documentation
    • Updated all example commands in the Llama distillation recipe documentation to ensure accuracy and correctness. The distributed training setup examples now properly reflect current configuration parameters. This helps users following the documentation to successfully configure their training environments with the correct settings needed for reliable and reproducible distillation workflow results.

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
@AAnoosheh AAnoosheh requested a review from yaoyu-33 April 2, 2026 19:45
@AAnoosheh AAnoosheh self-assigned this Apr 2, 2026
@AAnoosheh AAnoosheh added cherry-pick r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. labels Apr 2, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

The documentation file for the Llama distillation recipe is updated to change the --nproc_per_node parameter from 1 to 2 in the example command invocations, reflecting an update to the recommended number of processes per node.

Changes

Cohort / File(s) Summary
Documentation Update
docs/training/distillation.md
Updated example commands for running Llama distillation recipe by changing --nproc_per_node value from 1 to 2 in both the basic invocation and the --config-file variant.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix distill docs typo' accurately reflects the main change: correcting a documentation error where --nproc-per-node was set to 1 instead of 2 in the distillation recipe example.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only minor documentation changes (2 lines modified in markdown file) with no code modifications, new features, or breaking changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aanoosheh/distill-docs-typo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/training/distillation.md (1)

60-62: Consider adding a note about matching nproc_per_node to parallelism settings.

Since --nproc_per_node must match the total parallelism (TP × PP × CP), it would be helpful to add a brief explanation in the documentation. For example:

> **Note**: Set `--nproc_per_node` to match the product of `tensor_model_parallel_size`, `pipeline_model_parallel_size`, and `context_parallel_size`. For example, with TP=2, PP=1, CP=1, use `--nproc_per_node=2`.

This would help users understand why different examples use different values and how to determine the correct value for their custom configurations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/training/distillation.md` around lines 60 - 62, Add a brief explanatory
note in docs/training/distillation.md near the torchrun example clarifying that
the torchrun flag --nproc_per_node must equal the total parallelism
(tensor_model_parallel_size × pipeline_model_parallel_size ×
context_parallel_size), and give a short example (e.g., TP=2, PP=1, CP=1 →
--nproc_per_node=2) so users know how to choose the correct value for custom
configs; mention the three settings by name (tensor_model_parallel_size,
pipeline_model_parallel_size, context_parallel_size) and reference TP/PP/CP
shorthand for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/training/distillation.md`:
- Around line 52-53: The docs examples use --nproc_per_node=2 but the
distillation script distill_llama32_3b-1b.py documents and expects
--nproc_per_node=8; update every example invocation in
docs/training/distillation.md (all occurrences of --nproc_per_node=2 shown in
the examples) to --nproc_per_node=8 so the documentation matches the script's
specification and avoids GPU allocation errors.

---

Nitpick comments:
In `@docs/training/distillation.md`:
- Around line 60-62: Add a brief explanatory note in
docs/training/distillation.md near the torchrun example clarifying that the
torchrun flag --nproc_per_node must equal the total parallelism
(tensor_model_parallel_size × pipeline_model_parallel_size ×
context_parallel_size), and give a short example (e.g., TP=2, PP=1, CP=1 →
--nproc_per_node=2) so users know how to choose the correct value for custom
configs; mention the three settings by name (tensor_model_parallel_size,
pipeline_model_parallel_size, context_parallel_size) and reference TP/PP/CP
shorthand for clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: af4a387b-6e24-4929-9324-7d18dabdaac5

📥 Commits

Reviewing files that changed from the base of the PR and between 91c39a0 and 1305ae1.

📒 Files selected for processing (1)
  • docs/training/distillation.md

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
@yaoyu-33 yaoyu-33 added docs Documentation-only updates or documentation debt area:distill Knowledge distillation needs-review PR is ready for code review and waiting on a reviewer needs-author Author action is required before review or merge can continue and removed needs-review PR is ready for code review and waiting on a reviewer labels Apr 2, 2026
```bash
torchrun --nproc_per_node=1 examples/distillation/llama/distill_llama32_3b-1b.py
torchrun --nproc_per_node=2 examples/distillation/llama/distill_llama32_3b-1b.py
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AAnoosheh can we update all to uv run -m torch.distributed.run --nproc_per_node=2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:distill Knowledge distillation cherry-pick docs Documentation-only updates or documentation debt needs-author Author action is required before review or merge can continue r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants