docs(examples): add MPI + Argo Workflows integration example#5125
docs(examples): add MPI + Argo Workflows integration example#5125csh0101 wants to merge 1 commit intovolcano-sh:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the examples by introducing a robust integration for running MPI (Message Passing Interface) workloads with Argo Workflows, orchestrated by Volcano. It provides both a comprehensive WorkflowTemplate and a simpler workflow example, accompanied by detailed documentation, to enable users to efficiently manage high-performance computing tasks on Kubernetes. Additionally, a minor but important fix was implemented in the scheduler cache to prevent errors during concurrent queue creation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds valuable examples for running MPI jobs with Argo Workflows and Volcano. The documentation and templates are comprehensive. However, I've found several critical issues in the YAML examples related to the configuration of MPI jobs, specifically concerning process counts, resource allocation (minAvailable), and the usage of Volcano's svc plugin for host discovery. These issues would prevent the examples from running correctly as-is. I've provided detailed comments and suggestions to address these problems. Additionally, there's a minor correction needed in the README file.
| containers: | ||
| - name: mpimaster | ||
| image: mpioperator/mpi-pi:latest | ||
| command: [mpirun, "--allow-run-as-root", "--hostfile", "/etc/volcano/mpi.hostfile", "-np", "2", "./mpi-pi"] |
There was a problem hiding this comment.
The mpirun command has two issues:
- The number of processes (
-np) is hardcoded to2, but there are 3 pods in total (1 master + 2 workers). This will leave one worker pod idle. It should be3. - The
--hostfilepath/etc/volcano/mpi.hostfileis incorrect. Thesvcplugin creates host files for each task under/etc/volcano/conf/. The command should be updated to use these files, for example by creating a combined hostfile at runtime.
| name: "{{workflow.name}}" | ||
| uid: "{{workflow.uid}}" | ||
| spec: | ||
| minAvailable: {{inputs.parameters.mpi-worker-replicas}} |
| command: | ||
| - mpirun | ||
| args: | ||
| - "--allow-run-as-root" | ||
| - "--hostfile" | ||
| - "/etc/volcano/mpi.hostfile" | ||
| - "-np" | ||
| - "{{inputs.parameters.mpi-worker-replicas}}" | ||
| - "./mpi-pi" | ||
| resources: | ||
| requests: | ||
| cpu: "1" | ||
| memory: "2Gi" | ||
| volumeMounts: | ||
| - name: mpi-hostfile | ||
| mountPath: /etc/volcano | ||
| volumes: | ||
| - name: mpi-hostfile | ||
| emptyDir: {} |
There was a problem hiding this comment.
The MPI master container configuration has issues with process count, hostfile path, and volume mounts.
- Incorrect Process Count: The
-npargument formpirunon line 105 should be{{=asInt(inputs.parameters.mpi-worker-replicas) + 1}}to include the master process. - Incorrect Hostfile Path & Volume Conflict: The
svcplugin provides host files under/etc/volcano/conf, but this example uses an incorrect path (/etc/volcano/mpi.hostfileon line 103) and anemptyDirvolume (mpi-hostfileon lines 111-116) that hides the plugin-provided files.
To fix this, the mpi-hostfile volume and its mounts should be removed from both master and worker pod templates. The mpirun command should then be updated to use the host files from /etc/volcano/conf.
| command: [mpirun] | ||
| args: | ||
| - "--hostfile" | ||
| - "/etc/volcano/mpi.hostfile" | ||
| - "-np" | ||
| - "{{workflow.parameters.mpi-worker-replicas}}" | ||
| - "your-mpi-application" | ||
| ``` |
There was a problem hiding this comment.
The variable for mpi-worker-replicas in the example is incorrect. Within the submit-volcano-job template, you should use inputs.parameters to reference input parameters, not workflow.parameters.
| command: [mpirun] | |
| args: | |
| - "--hostfile" | |
| - "/etc/volcano/mpi.hostfile" | |
| - "-np" | |
| - "{{workflow.parameters.mpi-worker-replicas}}" | |
| - "your-mpi-application" | |
| ``` | |
| command: [mpirun] | |
| args: | |
| - "--hostfile" | |
| - "/etc/volcano/mpi.hostfile" | |
| - "-np" | |
| - "{{inputs.parameters.mpi-worker-replicas}}" | |
| - "your-mpi-application" |
| echo "Monitoring MPI job..." | ||
| kubectl get jobs -n $(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace) -l batch.volcano.sh/job-name |
There was a problem hiding this comment.
The monitor-mpi step is not functional. The kubectl get jobs command with the label selector batch.volcano.sh/job-name will not find the Volcano job, as this label is not set on the job object. Since the submit-mpi step already waits for the job to complete, this monitoring step could be simplified to just an informational message or removed.
echo "MPI job has completed."There was a problem hiding this comment.
Pull request overview
This PR adds an MPI (Message Passing Interface) integration example showing how to run Volcano MPI-style jobs via Argo Workflows, including a WorkflowTemplate and accompanying documentation. It also includes an unrelated scheduler cache change affecting queue creation behavior.
Changes:
- Add a full-featured Argo
WorkflowTemplateexample for submitting a Volcano MPI job and (intended) log following. - Add a simpler Argo
Workflowexample and a README describing setup/usage. - Adjust scheduler cache queue-creation logic to treat
AlreadyExistsas success during retries.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| pkg/scheduler/cache/cache.go | Changes queue creation retry handling (unrelated to the docs/examples scope). |
| example/integrations/argo/mpi-example/mpi-workflowtemplate.yaml | Adds MPI WorkflowTemplate with job submission + log-follow pattern. |
| example/integrations/argo/mpi-example/mpi-simple.yaml | Adds a minimal MPI Workflow example. |
| example/integrations/argo/mpi-example/README.md | Adds documentation for the MPI Argo integration example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: job-name | ||
| activeDeadlineSeconds: 3600 | ||
| container: | ||
| image: bitnami/kubectl:latest |
There was a problem hiding this comment.
Using bitnami/kubectl:latest makes the example non-reproducible and can unexpectedly break as the image changes over time. Pin to a specific kubectl image/tag (ideally matching your documented Kubernetes/Argo compatibility) to make the example stable and safer to copy into production.
| image: bitnami/kubectl:latest | |
| image: bitnami/kubectl:1.29.3 |
| echo "ERROR: Pod did not become ready" | ||
| kubectl get pod -n $NS $POD -o yaml | ||
| exit 1 | ||
| } | ||
|
|
||
| echo "" | ||
| echo "✓ Pod is ready" |
There was a problem hiding this comment.
Waiting on --for=condition=Ready can fail for short-lived batch pods: a pod that quickly reaches Succeeded is typically not Ready, so this can time out even though the job ran. Prefer waiting on pod phase (Running/Succeeded/Failed) or handle the Succeeded/Failed cases explicitly before attempting to stream logs.
| echo "ERROR: Pod did not become ready" | |
| kubectl get pod -n $NS $POD -o yaml | |
| exit 1 | |
| } | |
| echo "" | |
| echo "✓ Pod is ready" | |
| PHASE=$(kubectl get pod -n $NS $POD -o jsonpath='{.status.phase}' 2>/dev/null || echo "Unknown") | |
| if [ "$PHASE" = "Succeeded" ]; then | |
| echo "Pod reached terminal phase 'Succeeded' without becoming Ready; continuing to logs." | |
| elif [ "$PHASE" = "Failed" ]; then | |
| echo "ERROR: Pod entered Failed phase before becoming Ready" | |
| kubectl get pod -n $NS $POD -o yaml | |
| exit 1 | |
| else | |
| echo "ERROR: Pod did not become ready (phase: $PHASE)" | |
| kubectl get pod -n $NS $POD -o yaml | |
| exit 1 | |
| fi | |
| } | |
| echo "" | |
| echo "✓ Pod is ready or has completed" |
| successCondition: status.state.phase == Completed | ||
| failureCondition: status.state.phase == Failed |
There was a problem hiding this comment.
In this Workflow, submit-mpi uses a successCondition that waits for the Volcano Job to reach Completed, so the subsequent monitor-mpi step only runs after the job is already finished. If the goal is to monitor while it runs, split into a create step (no completion condition) plus a separate wait/monitor step (or run monitoring in parallel after creation).
| successCondition: status.state.phase == Completed | |
| failureCondition: status.state.phase == Failed |
pkg/scheduler/cache/cache.go
Outdated
| @@ -488,6 +488,14 @@ func newDefaultAndRootQueue(vcClient vcclient.Interface, defaultQueue string) { | |||
| klog.V(2).Infof("Successfully created queue %s", name) | |||
| return nil | |||
| }) | |||
|
|
|||
| // If the queue was created by another scheduler pod concurrently, treat it as success | |||
| if err != nil && apierrors.IsAlreadyExists(err) { | |||
| klog.V(2).Infof("Queue %s was created by another scheduler, skip.", name) | |||
| return nil | |||
| } | |||
|
|
|||
| return err | |||
There was a problem hiding this comment.
This PR is described as adding Argo/Volcano MPI examples, but it also changes scheduler cache behavior (queue creation retry/AlreadyExists handling). If this change is intentional, it should be split into a separate PR (or the PR description updated) to keep review scope clear and to avoid releasing an unrelated scheduler behavior change with documentation/examples.
| - name: submit-job | ||
| template: submit-volcano-job | ||
| arguments: | ||
| parameters: | ||
| - name: job-name | ||
| value: "{{workflow.parameters.job-name}}" | ||
| - name: mpi-worker-replicas | ||
| value: "{{workflow.parameters.mpi-worker-replicas}}" | ||
| - name: mpi-image | ||
| value: "{{workflow.parameters.mpi-image}}" | ||
|
|
||
| - name: follow-logs | ||
| template: follow-master-logs | ||
| dependencies: [submit-job] | ||
| arguments: | ||
| parameters: | ||
| - name: job-name | ||
| value: "{{workflow.parameters.job-name}}" | ||
|
|
||
| - name: submit-volcano-job | ||
| inputs: | ||
| parameters: | ||
| - name: job-name | ||
| - name: mpi-worker-replicas | ||
| - name: mpi-image | ||
| resource: | ||
| action: create | ||
| setOwnerReference: true | ||
| successCondition: status.state.phase == Completed | ||
| failureCondition: status.state.phase == Failed |
There was a problem hiding this comment.
The DAG ordering prevents the "log follower" from working: submit-volcano-job uses a successCondition that only becomes true when the Volcano Job is Completed, and follow-logs depends on submit-job. As a result, log following starts only after the MPI job finishes. Split this into (1) a create step with no completion successCondition (or a separate 'create' + 'wait' resource), then (2) run log-following after creation while (3) another step waits for job completion.
| - name: monitor-mpi | ||
| container: | ||
| image: bitnami/kubectl:latest | ||
| command: [sh, -c] | ||
| args: | ||
| - | | ||
| echo "Monitoring MPI job..." | ||
| kubectl get jobs -n $(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace) -l batch.volcano.sh/job-name |
There was a problem hiding this comment.
monitor-mpi runs kubectl get jobs ..., which queries Kubernetes batch/v1 Jobs, not Volcano batch.volcano.sh/v1alpha1 Jobs. Also the label selector -l batch.volcano.sh/job-name doesn't constrain to the created resource. Use kubectl get job.batch.volcano.sh (or the Volcano shortname, if available) and select by the specific name/generateName or by a label you set on the Volcano Job metadata.
|
|
||
| - name: monitor-mpi | ||
| container: | ||
| image: bitnami/kubectl:latest |
There was a problem hiding this comment.
Using bitnami/kubectl:latest makes the example non-reproducible and can unexpectedly break as the image changes. Pin to a specific tag/version for stability and safer copy/paste.
| image: bitnami/kubectl:latest | |
| image: bitnami/kubectl:1.29.3 |
| ## WorkflowTemplate Details | ||
|
|
||
| The workflow consists of two main steps: | ||
|
|
||
| 1. **submit-volcano-job**: Creates a Volcano Job with MPI master and workers | ||
| 2. **follow-master-logs**: Waits for the master pod and streams its logs | ||
|
|
There was a problem hiding this comment.
The README describes follow-master-logs as monitoring the master pod logs as the MPI job runs, but in the provided YAML the log follower task depends on submit-job, which (via successCondition) only completes after the Volcano Job finishes. Once the workflow logic is adjusted to start log-following after job creation, update this section if needed to match the actual execution order.
| parameters: | ||
| - name: job-name | ||
| value: mpi-job-example | ||
| - name: mpi-worker-replicas | ||
| value: "2" | ||
| - name: mpi-image | ||
| value: mpioperator/mpi-pi:latest |
There was a problem hiding this comment.
WorkflowTemplate parameters are not defined under spec.parameters in Argo Workflows; they should be under spec.arguments.parameters. With the current structure, {{workflow.parameters.*}} references may be unset and the template may fail validation on apply/submit.
| parameters: | |
| - name: job-name | |
| value: mpi-job-example | |
| - name: mpi-worker-replicas | |
| value: "2" | |
| - name: mpi-image | |
| value: mpioperator/mpi-pi:latest | |
| arguments: | |
| parameters: | |
| - name: job-name | |
| value: mpi-job-example | |
| - name: mpi-worker-replicas | |
| value: "2" | |
| - name: mpi-image | |
| value: mpioperator/mpi-pi:latest |
| setOwnerReference: true | ||
| successCondition: status.state.phase == Completed | ||
| failureCondition: status.state.phase == Failed | ||
| manifest: | | ||
| apiVersion: batch.volcano.sh/v1alpha1 | ||
| kind: Job | ||
| metadata: | ||
| name: {{inputs.parameters.job-name}} | ||
| ownerReferences: | ||
| - apiVersion: argoproj.io/v1alpha1 | ||
| blockOwnerDeletion: true | ||
| kind: Workflow | ||
| name: "{{workflow.name}}" | ||
| uid: "{{workflow.uid}}" | ||
| spec: |
There was a problem hiding this comment.
The manifest sets an explicit ownerReferences block while also using setOwnerReference: true. This is redundant and can produce duplicate ownerReferences entries depending on Argo version/behavior. Prefer a single mechanism (either rely on setOwnerReference or keep the explicit ownerReferences) to avoid confusion and improve compatibility with existing examples in example/integrations/argo/*.yaml which set ownerReferences directly.
d549077 to
7c89e3e
Compare
Add comprehensive MPI job examples for Argo Workflows integration: - mpi-workflowtemplate.yaml: Full-featured WorkflowTemplate with parameterization, log following, and production-ready configuration - mpi-simple.yaml: Simple Workflow example for quick start - README.md: Documentation covering architecture, usage, and customization Features: - MPI master-worker architecture using Volcano plugins (ssh, svc) - Argo DAG workflow for job submission and log monitoring - Log follower pattern for Argo UI visibility - Parameterized worker replica count and container image This helps users running HPC and scientific computing workloads on Kubernetes using Volcano and Argo together. Related-to: 5114 Signed-off-by: csh0101 <csh0101@example.com>
7c89e3e to
6a64115
Compare
Maybe some GPT-style inspiration 😂 but there are some nice ideas in this PR, such as |
Summary
Add comprehensive MPI (Message Passing Interface) job examples for Argo Workflows integration.
What this PR does
This PR addresses #5114 by providing production-ready examples for running MPI workloads using Volcano and Argo Workflows together.
Files Added
mpi-workflowtemplate.yaml - Full-featured WorkflowTemplate with:
mpi-simple.yaml - Simple Workflow example for quick start
README.md - Comprehensive documentation including:
Key Features
sshandsvcplugins for inter-pod communicationExample Usage
Testing
Related Issues
Fixes #5114
Notes
This example uses
mpioperator/mpi-pi:latestas the default image (a simple MPI Pi calculation example). Users can customize the image and command for their specific HPC workloads.