-
Notifications
You must be signed in to change notification settings - Fork 28
feat: manifest schema and generation #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Using CUE, we now provide a schema for manifest validation and generation based on GPU type. This can be utilised via the `nccl.nu` script that supports running a `sync` against a kubeconfig to create or verify GPU information from a remote cluster. You can also use `generate` with flags to get a dynamic Kubernetes manifest for running the nccl tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces CUE-based schema validation and manifest generation for NCCL tests. It adds a Nushell CLI script (nccl.nu) that can sync GPU configurations from a Kubernetes cluster and generate dynamic Kubernetes manifests for running NCCL tests based on GPU type and scale parameters.
Changes:
- Added CUE schema for GPU configuration validation
- Created Nushell CLI script for cluster synchronization and manifest generation
- Added GPU configuration files for multiple GPU types (A100, H100, A40, GB200, RTX P6000)
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| schema/gpu.cue | Defines the GPU configuration schema with validation constraints |
| utils/new_gpu.cue | Provides template for generating new GPU configurations from cluster data |
| gpus/*.cue | Configuration files for specific GPU types with hardware specifications |
| generate.cue | Main manifest generation logic with conditional feature flags |
| nccl.nu | Nushell CLI script for sync and generate commands |
| cue.mod/module.cue | CUE module definition with Kubernetes schema dependencies |
| README.md | Documentation for the new manifest generation feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Example of a |
|
I have basically zero experience in this codebase, but is this the first time we're introducing both CUE and nushell? |
Using CUE, we now provide a schema for manifest validation and generation based on GPU type.
This can be utilised via the
nccl.nuscript that supports running asyncagainst a kubeconfig tocreate or verify GPU information from a remote
cluster.
You can also use
generatewith flags to get adynamic Kubernetes manifest for running the nccl tests