-
Notifications
You must be signed in to change notification settings - Fork 248
docs: add release process developer guide #3152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ko3n1g
wants to merge
1
commit into
main
Choose a base branch
from
ko3n1g/docs/release-process-guide
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+87
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Release Developer Guide | ||
|
|
||
| ## Overview | ||
|
|
||
| Our release cycle spans **2 months**. During this window, we develop and land features through a series of Release Candidates (RCs), before entering a code-freeze period for stabilization and a final release. | ||
|
|
||
| ----- | ||
|
|
||
| ## Release Candidate Cadence | ||
|
|
||
| New RCs are cut every **Saturday**, when the weekly pipeline runs. | ||
|
|
||
| |RC |Approximate Timing|Key Activity | | ||
| |---|------------------|----------------------------------| | ||
| |RC0|Week 1 (7th–10th) |Major dependency bump: NGC PyTorch| | ||
| |RC1|Week 2 |Dependency bump: TransformerEngine| | ||
| |RC2|Week 3 |Feature development continues | | ||
| |RC3|Week 4 |**Code-freeze begins** | | ||
| | |Week 5 |Bug fixes, small improvements | | ||
| | |Week 6 |Bug fixes, small improvements | | ||
| | |Week 7 |QA exit, release | | ||
|
|
||
| RC0 through RC2 are a **feature development phase** — new features are actively being landed. Stabilization begins at RC3 with code-freeze. | ||
|
|
||
| From RC3 onward, RCs are cut **more frequently and as needed**, rather than strictly on Saturdays. | ||
|
|
||
| ----- | ||
|
|
||
| ## Golden Values | ||
|
|
||
| Golden values are reference outputs used to validate model behavior in CI. | ||
|
|
||
| ### During the RC Phase (before code-freeze) | ||
|
|
||
| Golden values are updated **selectively**: | ||
|
|
||
| - They are updated if the new values represent an **improvement**, or | ||
| - If the team **collectively decides** that a regression is acceptable. | ||
|
|
||
| This means golden values are not automatically updated with every run — a deliberate decision is required for any regression. | ||
|
|
||
| ### On the Release Branch (during code-freeze) | ||
|
|
||
| When the release branch is created at code-freeze, all golden values are updated **unconditionally**. Whatever the current output is becomes the new reference baseline for the release. | ||
|
|
||
| ----- | ||
|
|
||
| ## Code-Freeze | ||
|
|
||
| Code-freeze lasts **two weeks** and begins when RC3 is cut. This is the **stabilization phase** — no new features are landed. | ||
|
|
||
| ### First Half | ||
|
|
||
| - **Release branches are created.** | ||
| - All golden values on the release branch are updated unconditionally (see above). | ||
| - The **last bulk CI run** occurs one week into the code-freeze period. | ||
| - RCs continue to be cut as needed. | ||
|
|
||
| ### Second Half | ||
|
|
||
| - **Engineers are responsible for updating golden values** on the release branch — reviewing any remaining discrepancies and ensuring the suite is in a clean state ahead of release. | ||
| - RCs continue to be cut as needed. | ||
|
|
||
| ### Release Day | ||
|
|
||
| The release goes out on the **first Wednesday after the code-freeze window ends**. | ||
|
|
||
| ----- | ||
|
|
||
| ## CI and Known Failures | ||
|
|
||
| ### Ticket-Annotated Tests | ||
|
|
||
| Failing CI tests can be linked to a tracking ticket. When a test fails with the **same error code** as the one recorded on its linked ticket, CI reports it as **"passing, with known error"** rather than a hard failure. | ||
|
|
||
| This means **a green CI result does not guarantee a fully healthy test suite** — it means there are no *unexpected* failures. | ||
|
|
||
| ### Important: Keeping Annotations Up to Date | ||
|
|
||
| Ticket annotations must be actively maintained in **both directions**: | ||
|
|
||
| - **Add** a ticket annotation when a test starts failing with a known, accepted error. | ||
| - **Remove** the ticket annotation when the test heals. | ||
|
|
||
| If a test recovers but its ticket annotation is not removed, CI will report it as **failing** — because the actual error code no longer matches the one on record. The test being healthy is not enough; the annotation must be cleaned up for CI to go green again. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify Saturday cadence vs date-range example to avoid ambiguity.
Line 11 states RCs are cut every Saturday, but Line 15 shows “Week 1 (7th–10th)”, which reads as a multi-day window. Consider making the table strictly week-based (or explicitly marking date ranges as approximate windows).
🤖 Prompt for AI Agents