Skip to content

[CI] Add Docker build and push workflows with cross-platform support#279

Merged
hzxuzhonghu merged 18 commits intovolcano-sh:mainfrom
huntersman:hunter/dev/CI
Sep 1, 2025
Merged

[CI] Add Docker build and push workflows with cross-platform support#279
hzxuzhonghu merged 18 commits intovolcano-sh:mainfrom
huntersman:hunter/dev/CI

Conversation

@huntersman
Copy link
Copy Markdown
Contributor

@huntersman huntersman commented Aug 26, 2025

What type of PR is this?
/kind enhancement

What this PR does / why we need it:
This pull request introduces significant improvements to the Docker build and deployment process by adding GitHub Actions workflows for building and pushing Docker images, optimizing Dockerfile build steps for better caching, and refactoring the Makefile for more robust cross-platform builds.

CI/CD Automation:

  • Added .github/workflows/docker-build.yml to automate Docker image builds on pull requests and manual triggers using GitHub Actions.
  • Added .github/workflows/docker-push.yml to build and push Docker images to GitHub Container Registry on pushes to main, with proper authentication and permissions.

Docker Build Improvements:

  • Updated all service Dockerfiles to cache Go module dependencies before building, improving build performance and reliability. [1] [2] [3] [4] [5] [6] [7]

Makefile Refactoring:

  • Added a docker-build-all target to the Makefile to streamline building all service images at once.
  • Refactored the docker-buildx target to use temporary Dockerfiles with the --platform=${BUILDPLATFORM} directive for better cross-platform support, and to clean up after builds.

Which issue(s) this PR fixes:
Fixes #226 #227

Special notes for your reviewer:
Docker build will run when a PR is created to check if the Docker image can be built successfully.

Docker push will only run in matrixinfer-ai/matrixinfer when PR is merged.

It takes about 16 minutes for docker build, 30 minutes for docker push. docker push uses docker buildx, which costs more time than docker build

Self test -> https://github.com/huntersman/matrixinfer/actions/runs/17232176380/job/48888528367
image

And all images are tagged latest. I think we can use a specific tag for the release branch, and for the main branch, use latest.

Both docker build and docker push can be manually run in actions.
image

Does this PR introduce a user-facing change?:

NONE

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @huntersman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the project's Docker build and deployment infrastructure. It introduces new GitHub Actions workflows to automate the building and pushing of Docker images, ensuring a more streamlined and consistent CI/CD pipeline. Furthermore, the changes optimize Dockerfile build steps by implementing Go module dependency caching, leading to faster and more efficient builds. The Makefile has also been refactored to enhance cross-platform build support and simplify the overall build process for all services.

Highlights

  • CI/CD Automation: New GitHub Actions workflows (docker-build.yml and docker-push.yml) have been added to automate the Docker image build process on pull requests and push to the main branch, including proper authentication for GitHub Container Registry.
  • Docker Build Optimization: All service Dockerfiles have been updated to include a RUN go mod download step before the main build, which significantly improves build performance and reliability by caching Go module dependencies.
  • Makefile Enhancements: The Makefile has been refactored to include a new docker-build-all target for building all service images at once. Additionally, the docker-buildx target now dynamically creates temporary Dockerfiles with --platform=${BUILDPLATFORM} for robust cross-platform builds and includes cleanup steps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Code flows, builds take flight,Docker hums, cross-platform dreams,New images rise.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CI/CD workflows for Docker builds and improves Dockerfiles for better caching. My review focuses on the Makefile and Dockerfile changes.
I've identified a critical issue in the Makefile's docker-buildx target where the logic for creating cross-platform Dockerfiles is flawed and will not work as intended. I also pointed out that build errors are being suppressed, which could hide problems in CI.
Furthermore, the Dockerfile changes intended to improve caching are not implemented optimally. For several services, the entire project is copied before downloading dependencies, which negates caching benefits. For others, the order of COPY operations is incorrect, leading to the same problem. I've provided suggestions to fix these caching issues to align with best practices and the PR's goal.

Comment thread Makefile Outdated
Comment thread Makefile Outdated
Comment thread docker/Dockerfile.autoscaler
Comment thread docker/Dockerfile.modelcontroller
Comment thread .github/workflows/docker-build.yml Outdated
@huntersman
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the Docker build process by introducing CI/CD workflows and optimizing the Dockerfiles for better layer caching. The changes to the Dockerfiles to cache Go module dependencies are excellent and follow best practices. My main feedback is on the Makefile's docker-buildx target, which currently suppresses build errors. I've provided a critical comment with a suggestion to make the build process more robust by ensuring errors are propagated and cleanup is handled reliably.

Comment thread Makefile Outdated
@huntersman

This comment was marked as resolved.

@huntersman

This comment was marked as outdated.

@huntersman huntersman marked this pull request as draft August 28, 2025 02:08
Comment thread .github/workflows/docker-build-and-push-python.yml
Comment thread .github/workflows/docker-build-and-push-python.yml
Comment thread .github/workflows/docker-push.yml Outdated
Comment thread Makefile
@hzxuzhonghu
Copy link
Copy Markdown
Member

I am thinking if we can use https://github.com/docker/build-push-action and then make docker-buildx is not needed

        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      -
        name: Build and push xx
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: user/app:latest
          file: dockerfile.gateway
        name: Build and push yy
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: user/app:latest
          file: dockerfile.controller

@huntersman

This comment was marked as resolved.

@huntersman huntersman marked this pull request as ready for review August 28, 2025 03:45
@hzxuzhonghu
Copy link
Copy Markdown
Member

they does not conflict, seems docker/build-push-action is more simple. In local env, we can keep docker-buildx if we want to push manually

@huntersman
Copy link
Copy Markdown
Contributor Author

@huntersman

This comment was marked as resolved.

@huntersman

This comment was marked as off-topic.

with:
context: ./python
platforms: ${{ steps.platforms.outputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 Curious why separate the job

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone only modified Python, we don't expect to build the Go Docker image, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building an amd64 image is fast, but building an arm64 image is very slow. Separate the job to save time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the result

image

Comment thread Makefile Outdated
@hzxuzhonghu
Copy link
Copy Markdown
Member

By the way, local build will encounter some network issues :)

Prepare your ladder, lol

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make use of docker buildx cache to speed up image build later https://docs.docker.com/build/cache/

@hzxuzhonghu hzxuzhonghu merged commit 55d73de into volcano-sh:main Sep 1, 2025
1 check failed
@huntersman huntersman deleted the hunter/dev/CI branch September 1, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI for Docker build and push

3 participants