Conversation
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
📝 WalkthroughWalkthroughThe documentation file for the Llama distillation recipe is updated to change the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/training/distillation.md (1)
60-62: Consider adding a note about matchingnproc_per_nodeto parallelism settings.Since
--nproc_per_nodemust match the total parallelism (TP × PP × CP), it would be helpful to add a brief explanation in the documentation. For example:> **Note**: Set `--nproc_per_node` to match the product of `tensor_model_parallel_size`, `pipeline_model_parallel_size`, and `context_parallel_size`. For example, with TP=2, PP=1, CP=1, use `--nproc_per_node=2`.This would help users understand why different examples use different values and how to determine the correct value for their custom configurations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/training/distillation.md` around lines 60 - 62, Add a brief explanatory note in docs/training/distillation.md near the torchrun example clarifying that the torchrun flag --nproc_per_node must equal the total parallelism (tensor_model_parallel_size × pipeline_model_parallel_size × context_parallel_size), and give a short example (e.g., TP=2, PP=1, CP=1 → --nproc_per_node=2) so users know how to choose the correct value for custom configs; mention the three settings by name (tensor_model_parallel_size, pipeline_model_parallel_size, context_parallel_size) and reference TP/PP/CP shorthand for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/training/distillation.md`:
- Around line 52-53: The docs examples use --nproc_per_node=2 but the
distillation script distill_llama32_3b-1b.py documents and expects
--nproc_per_node=8; update every example invocation in
docs/training/distillation.md (all occurrences of --nproc_per_node=2 shown in
the examples) to --nproc_per_node=8 so the documentation matches the script's
specification and avoids GPU allocation errors.
---
Nitpick comments:
In `@docs/training/distillation.md`:
- Around line 60-62: Add a brief explanatory note in
docs/training/distillation.md near the torchrun example clarifying that the
torchrun flag --nproc_per_node must equal the total parallelism
(tensor_model_parallel_size × pipeline_model_parallel_size ×
context_parallel_size), and give a short example (e.g., TP=2, PP=1, CP=1 →
--nproc_per_node=2) so users know how to choose the correct value for custom
configs; mention the three settings by name (tensor_model_parallel_size,
pipeline_model_parallel_size, context_parallel_size) and reference TP/PP/CP
shorthand for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: af4a387b-6e24-4929-9324-7d18dabdaac5
📒 Files selected for processing (1)
docs/training/distillation.md
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
| ```bash | ||
| torchrun --nproc_per_node=1 examples/distillation/llama/distill_llama32_3b-1b.py | ||
| torchrun --nproc_per_node=2 examples/distillation/llama/distill_llama32_3b-1b.py |
There was a problem hiding this comment.
@AAnoosheh can we update all to uv run -m torch.distributed.run --nproc_per_node=2
What does this PR do ?
Distillation accidentally mentioned
--nproc-per-node=1instead of2Changelog
--nproc-per-node=1to--nproc-per-node=2GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit