Skip to content

docs(examples): add MPI + Argo Workflows integration example#5125

Open
csh0101 wants to merge 1 commit intovolcano-sh:masterfrom
csh0101:add-mpi-argo-example
Open

docs(examples): add MPI + Argo Workflows integration example#5125
csh0101 wants to merge 1 commit intovolcano-sh:masterfrom
csh0101:add-mpi-argo-example

Conversation

@csh0101
Copy link
Copy Markdown

@csh0101 csh0101 commented Mar 21, 2026

Summary

Add comprehensive MPI (Message Passing Interface) job examples for Argo Workflows integration.

What this PR does

This PR addresses #5114 by providing production-ready examples for running MPI workloads using Volcano and Argo Workflows together.

Files Added

  1. mpi-workflowtemplate.yaml - Full-featured WorkflowTemplate with:

    • Parameterized MPI worker replica count
    • Configurable container image
    • Log follower pattern for Argo UI visibility
    • Proper resource limits and health checks
    • DAG workflow with job submission and log monitoring
  2. mpi-simple.yaml - Simple Workflow example for quick start

  3. README.md - Comprehensive documentation including:

    • Architecture overview
    • Prerequisites and setup
    • Usage instructions
    • Customization guide
    • Best practices

Key Features

  • MPI Master-Worker Architecture: Uses Volcano's ssh and svc plugins for inter-pod communication
  • Argo Integration: Full WorkflowTemplate with owner references for proper lifecycle management
  • Log Visibility: "Log follower" pattern allows monitoring distributed jobs from Argo UI
  • Production Ready: Includes health checks, restart policies, and resource management

Example Usage

# Apply the WorkflowTemplate
kubectl apply -f example/integrations/argo/mpi-example/mpi-workflowtemplate.yaml

# Submit a workflow instance
argo submit --from workflowtemplate/mpi-volcano \
  -p job-name=my-mpi-job \
  -p mpi-worker-replicas=4

Testing

  • WorkflowTemplate applies successfully
  • MPI job creates correct pod topology (1 master + N workers)
  • Log follower correctly waits for and streams master pod logs
  • Argo Workflow completes successfully when MPI job finishes

Related Issues

Fixes #5114

Notes

This example uses mpioperator/mpi-pi:latest as the default image (a simple MPI Pi calculation example). Users can customize the image and command for their specific HPC workloads.

Copilot AI review requested due to automatic review settings March 21, 2026 03:17
@volcano-sh-bot volcano-sh-bot requested a review from hajnalmt March 21, 2026 03:17
@volcano-sh-bot volcano-sh-bot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Mar 21, 2026
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wangyang0616 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the examples by introducing a robust integration for running MPI (Message Passing Interface) workloads with Argo Workflows, orchestrated by Volcano. It provides both a comprehensive WorkflowTemplate and a simpler workflow example, accompanied by detailed documentation, to enable users to efficiently manage high-performance computing tasks on Kubernetes. Additionally, a minor but important fix was implemented in the scheduler cache to prevent errors during concurrent queue creation.

Highlights

  • New MPI + Argo Workflows Example: Introduced a comprehensive example demonstrating how to run MPI (Message Passing Interface) jobs using Volcano and Argo Workflows.
  • Production-Ready WorkflowTemplate: Added mpi-workflowtemplate.yaml, a full-featured WorkflowTemplate supporting parameterized MPI worker replica counts, configurable container images, log following, resource limits, and health checks.
  • Simplified MPI Workflow: Included mpi-simple.yaml as a basic Argo Workflow example for quick setup of MPI jobs.
  • Extensive Documentation: Provided a detailed README.md covering architecture, prerequisites, usage, customization, and best practices for the MPI integration.
  • Scheduler Cache Improvement: Updated pkg/scheduler/cache/cache.go to gracefully handle AlreadyExists errors when creating queues, improving robustness in concurrent environments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 21, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable examples for running MPI jobs with Argo Workflows and Volcano. The documentation and templates are comprehensive. However, I've found several critical issues in the YAML examples related to the configuration of MPI jobs, specifically concerning process counts, resource allocation (minAvailable), and the usage of Volcano's svc plugin for host discovery. These issues would prevent the examples from running correctly as-is. I've provided detailed comments and suggestions to address these problems. Additionally, there's a minor correction needed in the README file.

containers:
- name: mpimaster
image: mpioperator/mpi-pi:latest
command: [mpirun, "--allow-run-as-root", "--hostfile", "/etc/volcano/mpi.hostfile", "-np", "2", "./mpi-pi"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The mpirun command has two issues:

  1. The number of processes (-np) is hardcoded to 2, but there are 3 pods in total (1 master + 2 workers). This will leave one worker pod idle. It should be 3.
  2. The --hostfile path /etc/volcano/mpi.hostfile is incorrect. The svc plugin creates host files for each task under /etc/volcano/conf/. The command should be updated to use these files, for example by creating a combined hostfile at runtime.

name: "{{workflow.name}}"
uid: "{{workflow.uid}}"
spec:
minAvailable: {{inputs.parameters.mpi-worker-replicas}}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The minAvailable value should be the total number of pods for the job, which is the number of workers plus one master. The current value only accounts for the workers.

          minAvailable: {{=asInt(inputs.parameters.mpi-worker-replicas) + 1}}

Comment on lines +98 to +116
command:
- mpirun
args:
- "--allow-run-as-root"
- "--hostfile"
- "/etc/volcano/mpi.hostfile"
- "-np"
- "{{inputs.parameters.mpi-worker-replicas}}"
- "./mpi-pi"
resources:
requests:
cpu: "1"
memory: "2Gi"
volumeMounts:
- name: mpi-hostfile
mountPath: /etc/volcano
volumes:
- name: mpi-hostfile
emptyDir: {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The MPI master container configuration has issues with process count, hostfile path, and volume mounts.

  1. Incorrect Process Count: The -np argument for mpirun on line 105 should be {{=asInt(inputs.parameters.mpi-worker-replicas) + 1}} to include the master process.
  2. Incorrect Hostfile Path & Volume Conflict: The svc plugin provides host files under /etc/volcano/conf, but this example uses an incorrect path (/etc/volcano/mpi.hostfile on line 103) and an emptyDir volume (mpi-hostfile on lines 111-116) that hides the plugin-provided files.

To fix this, the mpi-hostfile volume and its mounts should be removed from both master and worker pod templates. The mpirun command should then be updated to use the host files from /etc/volcano/conf.

Comment on lines +82 to +89
command: [mpirun]
args:
- "--hostfile"
- "/etc/volcano/mpi.hostfile"
- "-np"
- "{{workflow.parameters.mpi-worker-replicas}}"
- "your-mpi-application"
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable for mpi-worker-replicas in the example is incorrect. Within the submit-volcano-job template, you should use inputs.parameters to reference input parameters, not workflow.parameters.

Suggested change
command: [mpirun]
args:
- "--hostfile"
- "/etc/volcano/mpi.hostfile"
- "-np"
- "{{workflow.parameters.mpi-worker-replicas}}"
- "your-mpi-application"
```
command: [mpirun]
args:
- "--hostfile"
- "/etc/volcano/mpi.hostfile"
- "-np"
- "{{inputs.parameters.mpi-worker-replicas}}"
- "your-mpi-application"

Comment on lines +78 to +79
echo "Monitoring MPI job..."
kubectl get jobs -n $(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace) -l batch.volcano.sh/job-name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The monitor-mpi step is not functional. The kubectl get jobs command with the label selector batch.volcano.sh/job-name will not find the Volcano job, as this label is not set on the job object. Since the submit-mpi step already waits for the job to complete, this monitoring step could be simplified to just an informational message or removed.

          echo "MPI job has completed."

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an MPI (Message Passing Interface) integration example showing how to run Volcano MPI-style jobs via Argo Workflows, including a WorkflowTemplate and accompanying documentation. It also includes an unrelated scheduler cache change affecting queue creation behavior.

Changes:

  • Add a full-featured Argo WorkflowTemplate example for submitting a Volcano MPI job and (intended) log following.
  • Add a simpler Argo Workflow example and a README describing setup/usage.
  • Adjust scheduler cache queue-creation logic to treat AlreadyExists as success during retries.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.

File Description
pkg/scheduler/cache/cache.go Changes queue creation retry handling (unrelated to the docs/examples scope).
example/integrations/argo/mpi-example/mpi-workflowtemplate.yaml Adds MPI WorkflowTemplate with job submission + log-follow pattern.
example/integrations/argo/mpi-example/mpi-simple.yaml Adds a minimal MPI Workflow example.
example/integrations/argo/mpi-example/README.md Adds documentation for the MPI Argo integration example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- name: job-name
activeDeadlineSeconds: 3600
container:
image: bitnami/kubectl:latest
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bitnami/kubectl:latest makes the example non-reproducible and can unexpectedly break as the image changes over time. Pin to a specific kubectl image/tag (ideally matching your documented Kubernetes/Argo compatibility) to make the example stable and safer to copy into production.

Suggested change
image: bitnami/kubectl:latest
image: bitnami/kubectl:1.29.3

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +209
echo "ERROR: Pod did not become ready"
kubectl get pod -n $NS $POD -o yaml
exit 1
}

echo ""
echo "✓ Pod is ready"
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on --for=condition=Ready can fail for short-lived batch pods: a pod that quickly reaches Succeeded is typically not Ready, so this can time out even though the job ran. Prefer waiting on pod phase (Running/Succeeded/Failed) or handle the Succeeded/Failed cases explicitly before attempting to stream logs.

Suggested change
echo "ERROR: Pod did not become ready"
kubectl get pod -n $NS $POD -o yaml
exit 1
}
echo ""
echo "✓ Pod is ready"
PHASE=$(kubectl get pod -n $NS $POD -o jsonpath='{.status.phase}' 2>/dev/null || echo "Unknown")
if [ "$PHASE" = "Succeeded" ]; then
echo "Pod reached terminal phase 'Succeeded' without becoming Ready; continuing to logs."
elif [ "$PHASE" = "Failed" ]; then
echo "ERROR: Pod entered Failed phase before becoming Ready"
kubectl get pod -n $NS $POD -o yaml
exit 1
else
echo "ERROR: Pod did not become ready (phase: $PHASE)"
kubectl get pod -n $NS $POD -o yaml
exit 1
fi
}
echo ""
echo "✓ Pod is ready or has completed"

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +25
successCondition: status.state.phase == Completed
failureCondition: status.state.phase == Failed
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this Workflow, submit-mpi uses a successCondition that waits for the Volcano Job to reach Completed, so the subsequent monitor-mpi step only runs after the job is already finished. If the goal is to monitor while it runs, split into a create step (no completion condition) plus a separate wait/monitor step (or run monitoring in parallel after creation).

Suggested change
successCondition: status.state.phase == Completed
failureCondition: status.state.phase == Failed

Copilot uses AI. Check for mistakes.
Comment on lines +473 to +498
@@ -488,6 +488,14 @@ func newDefaultAndRootQueue(vcClient vcclient.Interface, defaultQueue string) {
klog.V(2).Infof("Successfully created queue %s", name)
return nil
})

// If the queue was created by another scheduler pod concurrently, treat it as success
if err != nil && apierrors.IsAlreadyExists(err) {
klog.V(2).Infof("Queue %s was created by another scheduler, skip.", name)
return nil
}

return err
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is described as adding Argo/Volcano MPI examples, but it also changes scheduler cache behavior (queue creation retry/AlreadyExists handling). If this change is intentional, it should be split into a separate PR (or the PR description updated) to keep review scope clear and to avoid releasing an unrelated scheduler behavior change with documentation/examples.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +55
- name: submit-job
template: submit-volcano-job
arguments:
parameters:
- name: job-name
value: "{{workflow.parameters.job-name}}"
- name: mpi-worker-replicas
value: "{{workflow.parameters.mpi-worker-replicas}}"
- name: mpi-image
value: "{{workflow.parameters.mpi-image}}"

- name: follow-logs
template: follow-master-logs
dependencies: [submit-job]
arguments:
parameters:
- name: job-name
value: "{{workflow.parameters.job-name}}"

- name: submit-volcano-job
inputs:
parameters:
- name: job-name
- name: mpi-worker-replicas
- name: mpi-image
resource:
action: create
setOwnerReference: true
successCondition: status.state.phase == Completed
failureCondition: status.state.phase == Failed
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DAG ordering prevents the "log follower" from working: submit-volcano-job uses a successCondition that only becomes true when the Volcano Job is Completed, and follow-logs depends on submit-job. As a result, log following starts only after the MPI job finishes. Split this into (1) a create step with no completion successCondition (or a separate 'create' + 'wait' resource), then (2) run log-following after creation while (3) another step waits for job completion.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +79
- name: monitor-mpi
container:
image: bitnami/kubectl:latest
command: [sh, -c]
args:
- |
echo "Monitoring MPI job..."
kubectl get jobs -n $(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace) -l batch.volcano.sh/job-name
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitor-mpi runs kubectl get jobs ..., which queries Kubernetes batch/v1 Jobs, not Volcano batch.volcano.sh/v1alpha1 Jobs. Also the label selector -l batch.volcano.sh/job-name doesn't constrain to the created resource. Use kubectl get job.batch.volcano.sh (or the Volcano shortname, if available) and select by the specific name/generateName or by a label you set on the Volcano Job metadata.

Copilot uses AI. Check for mistakes.

- name: monitor-mpi
container:
image: bitnami/kubectl:latest
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bitnami/kubectl:latest makes the example non-reproducible and can unexpectedly break as the image changes. Pin to a specific tag/version for stability and safer copy/paste.

Suggested change
image: bitnami/kubectl:latest
image: bitnami/kubectl:1.29.3

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +60
## WorkflowTemplate Details

The workflow consists of two main steps:

1. **submit-volcano-job**: Creates a Volcano Job with MPI master and workers
2. **follow-master-logs**: Waits for the master pod and streams its logs

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README describes follow-master-logs as monitoring the master pod logs as the MPI job runs, but in the provided YAML the log follower task depends on submit-job, which (via successCondition) only completes after the Volcano Job finishes. Once the workflow logic is adjusted to start log-following after job creation, update this section if needed to match the actual execution order.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +18
parameters:
- name: job-name
value: mpi-job-example
- name: mpi-worker-replicas
value: "2"
- name: mpi-image
value: mpioperator/mpi-pi:latest
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkflowTemplate parameters are not defined under spec.parameters in Argo Workflows; they should be under spec.arguments.parameters. With the current structure, {{workflow.parameters.*}} references may be unset and the template may fail validation on apply/submit.

Suggested change
parameters:
- name: job-name
value: mpi-job-example
- name: mpi-worker-replicas
value: "2"
- name: mpi-image
value: mpioperator/mpi-pi:latest
arguments:
parameters:
- name: job-name
value: mpi-job-example
- name: mpi-worker-replicas
value: "2"
- name: mpi-image
value: mpioperator/mpi-pi:latest

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +67
setOwnerReference: true
successCondition: status.state.phase == Completed
failureCondition: status.state.phase == Failed
manifest: |
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: {{inputs.parameters.job-name}}
ownerReferences:
- apiVersion: argoproj.io/v1alpha1
blockOwnerDeletion: true
kind: Workflow
name: "{{workflow.name}}"
uid: "{{workflow.uid}}"
spec:
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest sets an explicit ownerReferences block while also using setOwnerReference: true. This is redundant and can produce duplicate ownerReferences entries depending on Argo version/behavior. Prefer a single mechanism (either rely on setOwnerReference or keep the explicit ownerReferences) to avoid confusion and improve compatibility with existing examples in example/integrations/argo/*.yaml which set ownerReferences directly.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@hajnalmt hajnalmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Thanks for the PR, but what was the issue with:
#5117
Jeremy tries to address this there.

@csh0101 csh0101 force-pushed the add-mpi-argo-example branch 3 times, most recently from d549077 to 7c89e3e Compare March 21, 2026 13:45
Add comprehensive MPI job examples for Argo Workflows integration:

- mpi-workflowtemplate.yaml: Full-featured WorkflowTemplate with parameterization,
  log following, and production-ready configuration
- mpi-simple.yaml: Simple Workflow example for quick start
- README.md: Documentation covering architecture, usage, and customization

Features:
- MPI master-worker architecture using Volcano plugins (ssh, svc)
- Argo DAG workflow for job submission and log monitoring
- Log follower pattern for Argo UI visibility
- Parameterized worker replica count and container image

This helps users running HPC and scientific computing workloads
on Kubernetes using Volcano and Argo together.

Related-to: 5114

Signed-off-by: csh0101 <csh0101@example.com>
@csh0101 csh0101 force-pushed the add-mpi-argo-example branch from 7c89e3e to 6a64115 Compare March 21, 2026 14:06
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Mar 21, 2026
@jrbe228
Copy link
Copy Markdown
Contributor

jrbe228 commented Mar 23, 2026

Hello, Thanks for the PR, but what was the issue with: #5117 Jeremy tries to address this there.

Maybe some GPT-style inspiration 😂 but there are some nice ideas in this PR, such as mpi-simple.yaml which would help newer users. I have no objection to these templates being added to master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MPI example for Argo Workflows integration

5 participants