Skip to content

ci: simulate pre-baking CI dependencies to measure timing improvement#3835

Draft
Copilot wants to merge 2 commits intofeat/parallel-deploy-orchestratorfrom
copilot/simulate-pre-install-deps-early-ci
Draft

ci: simulate pre-baking CI dependencies to measure timing improvement#3835
Copilot wants to merge 2 commits intofeat/parallel-deploy-orchestratorfrom
copilot/simulate-pre-install-deps-early-ci

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 15, 2026

Simulates moving expensive CI setup work into the Nodepool diskimage by front-loading all dependency installs into a new pre-run playbook. Structured as two commits so CI timing can be compared before/after without actually modifying the diskimage.

Commit 1 — pre-install dependencies early in CI

New playbook test-playbooks/molecule/pre-install-deps.yml runs before all other CI work via a new pre-run entry in .zuul.yaml:

pre-run:
  - test-playbooks/molecule/pre-install-deps.yml
  - test-playbooks/molecule/pre.yml

Installs 4 dependency categories up front:

  • System packagesopen-iscsi, multipath-tools, sg3-utils, openssh-server, dirmngr, jq, gpg, gpg-agent, socat, conntrack, ipset, ethtool, ebtables, python3-venv, python3-pip; enables iscsid and multipathd via ansible.builtin.systemd
  • Helm 3.14.4 — extracted to /usr/local/bin/helm
  • Go 1.24.2 — extracted to /usr/local
  • Python venv/opt/atmosphere-venv with openstacksdk and python-openstackclient

Commit 2 — remove redundant installs

  • test-playbooks/molecule/pre.yml — tasks emptied (jq now pre-installed)
  • molecule/aio/prepare.yml — removes dirmngr install task
  • .github/actions/molecule/action.yaml — two changes:
    • openssh-server install skipped if already present (dpkg -s check)
    • Helm removal is now conditional: only removes if installed version is not v3.14.x
Original prompt

Goal

Simulate the effect of pre-baking CI dependencies into the Nodepool diskimage, so we can measure the time improvement before actually modifying the diskimage. This PR should have two commits:

Commit 1: "Simulate PR1 — pre-install dependencies early in CI"

Add a new pre-run playbook test-playbooks/molecule/pre-install-deps.yml that installs all 4 categories of dependencies before any other CI work happens. Wire it into .zuul.yaml as an additional pre-run step that runs before the existing test-playbooks/molecule/pre.yml.

The new playbook test-playbooks/molecule/pre-install-deps.yml should:

  1. System packages — install all of these in a single apt-get install call with update_cache: true:

    • open-iscsi, multipath-tools, sg3-utils
    • openssh-server, dirmngr, jq
    • gpg, gpg-agent, socat, conntrack, ipset, ethtool, ebtables
    • Then enable iscsid and multipathd services via systemctl enable
  2. Helm binary — download and install Helm 3.14.4 to /usr/local/bin/helm:

    curl -fsSL https://get.helm.sh/helm-v3.14.4-linux-amd64.tar.gz | tar -xz -C /usr/local/bin --strip-components=1 linux-amd64/helm
    
  3. Go toolchain — download and install Go 1.24.2 to /usr/local:

    curl -fsSL https://go.dev/dl/go1.24.2.linux-amd64.tar.gz | tar -xz -C /usr/local
    
  4. Python venv with openstacksdk — create a venv at /opt/atmosphere-venv and install openstacksdk and python-openstackclient:

    python3 -m venv /opt/atmosphere-venv
    /opt/atmosphere-venv/bin/pip install --no-cache-dir openstacksdk python-openstackclient
    

    (Make sure python3-venv and python3-pip are installed via apt first — add them to the system packages list above.)

The playbook should target hosts: all with become: true. Use ansible.builtin.apt for packages, ansible.builtin.shell or ansible.builtin.command for the binary downloads, and ansible.builtin.pip for the Python packages. Add the SPDX-License-Identifier: Apache-2.0 copyright header matching the style of the existing test-playbooks/molecule/pre.yml.

In .zuul.yaml, add the new playbook as a pre-run before the existing pre-run: test-playbooks/molecule/pre.yml in the atmosphere-molecule job definition. The .zuul.yaml job currently has:

pre-run: test-playbooks/molecule/pre.yml

Change it to a list:

pre-run:
  - test-playbooks/molecule/pre-install-deps.yml
  - test-playbooks/molecule/pre.yml

Commit 2: "Simulate PR2 — remove redundant installs"

Now that the dependencies are pre-installed, remove the redundant install steps:

  1. test-playbooks/molecule/pre.yml — Remove the "Install jq for log collection" task entirely (jq is now pre-installed). The file should still exist as a valid playbook but with an empty tasks list:

    # Copyright (c) 2025 VEXXHOST, Inc.
    # SPDX-License-Identifier: Apache-2.0
    
    - hosts: all
      become: true
      tasks: []
  2. molecule/aio/prepare.yml — Remove only the "Install dirmngr for GPG keyserver operations" task (lines 19-22). Keep everything else (hostname workaround, snapd purge, networking, ceph, neutron) exactly as-is.

  3. .github/actions/molecule/action.yaml — Make two changes:

    • Replace the openssh-server install step (lines 25-27) with a conditional that skips if already installed:
      - shell: bash
        run: |
          if ! dpkg -s openssh-server &>/dev/null; then
            sudo apt-get update
            sudo apt-get install -y openssh-server
          fi
    • Replace the unconditional Helm removal (line 61) with a version check:
      - shell: bash
        run: |
          if ! helm version --short 2>/dev/null | grep -q "v3.14"; then
            rm -rfv /usr/local/bin/helm || true
          fi

Important Notes

  • Do NOT change any Ansible roles, playbooks under roles/, playbooks/, or the molecule/aio/converge.yml
  • Do NOT change any Helm charts or values files
  • The purpose is purely to measure timing improvement in CI — this simulates what a diskimage pre-bake would achieve

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: what's my Remaining reqs

Assistant:

I don't have access to check your remaining API requests, rate limits, or usage quotas. That information isn't available through any of the tools I have.

If you're asking about GitHub API rate limits, you can check them yourself:

curl -H "Authorization: Bearer YOUR_TOKEN" https://api.github.com/rate_limit

Or if you're asking about Copilot Chat usage limits, you can check your plan details at github.com/settings/copilot.

What specifically are you trying to check?

User: analysis time in https://zuul.oss.vexxhost.dev/stream/19c9e1d14aa648ae8e0322607a286ec4 like https://github.
com//pull/3818#issuecomment-4245094888
...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Simulate PR1 — pre-install dependencies early in CI ci: simulate pre-baking CI dependencies to measure timing improvement Apr 15, 2026
@ricolin Rico Lin (ricolin) force-pushed the copilot/simulate-pre-install-deps-early-ci branch from 5617670 to c179c6c Compare April 15, 2026 12:05
@ricolin Rico Lin (ricolin) changed the base branch from main to feat/parallel-deploy-orchestrator April 15, 2026 12:05
@ricolin Rico Lin (ricolin) changed the base branch from feat/parallel-deploy-orchestrator to main April 15, 2026 12:14
@ricolin
Copy link
Copy Markdown
Member

Rico Lin (ricolin) commented Apr 16, 2026

a quick compare

PR #3835 vs PR #3818 (excluding pre-install-deps time)

┌───────────┬──────────────┬──────────────┬──────────────────────┐
│ Job       │ PR#3818      │ PR#3835*     │ Diff                 │
├───────────┼──────────────┼──────────────┼──────────────────────┤
│ aio-ovn   │ 59m 46s      │ 57m 21s      │ -2m 25s (-4.0%)      │
├───────────┼──────────────┼──────────────┼──────────────────────┤
│ aio-ovs   │ 61m 18s      │ 60m 39s      │ -0m 39s (-1.1%)      │
├───────────┼──────────────┼──────────────┼──────────────────────┤
│ csi-lpp   │ 6m 28s       │ 6m 32s       │ +0m 04s (+1.0%)      │
├───────────┼──────────────┼──────────────┼──────────────────────┤
│ csi-rbd   │ 13m 02s      │ 11m 32s      │ -1m 30s (-11.5%)     │
├───────────┼──────────────┼──────────────┼──────────────────────┤
│ keycloak  │ 18m 13s      │ 17m 43s      │ -0m 30s (-2.7%)      │
├───────────┼──────────────┼──────────────┼──────────────────────┤
│ Total     │ 158m 47s     │ 153m 47s     │ -5m 00s (-3.1%)      │
└───────────┴──────────────┴──────────────┴──────────────────────┘

Deploy phase improved significantly — AIO-OVN converge went from 46m 15s → 38m 54s (-15.9%), likely from the 293c200 commit
removing the apt resource constraint.

The pre-install-deps overhead is only ~51s per job (would be zero when baked into a diskimage). Combined with PR #3818's
baseline: 84m 30s saved (35.5%) vs serialized baseline.

Rico Lin (ricolin) and others added 2 commits April 24, 2026 00:43
Simulates moving expensive CI setup work into the Nodepool diskimage by
front-loading all dependency installs into a new pre-run playbook.

A new playbook test-playbooks/molecule/pre-install-deps.yml runs before
all other CI work via a new pre-run entry in .zuul.yaml. It installs:

- System packages for iSCSI, multipath, signing, and Python tooling;
  enables iscsid and multipathd via ansible.builtin.systemd
- Helm 3.14.4 to /usr/local/bin/helm
- Go 1.24.2 to /usr/local
- A Python venv at /opt/atmosphere-venv with openstacksdk and
  python-openstackclient

Remove redundant installs now that dependencies are pre-installed:
- test-playbooks/molecule/pre.yml: empty jq install task
- molecule/aio/prepare.yml: drop dirmngr install task
- .github/actions/molecule/action.yaml: skip openssh-server install if
  already present, and only remove Helm if the installed version is not
  v3.14.x

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ricolin <rlin@vexxhost.com>
Add a prepull phase to the parallel deploy orchestrator that pulls all
container images declared by the resolved component set before any
wave begins executing. This eliminates image-pull time from the
critical path of the first wave and keeps Kubernetes nodes from
serializing on registry bandwidth mid-deploy.

Scope the prepull by host group so each image is fetched only on nodes
where it runs, and bound concurrency so registry throttling and disk
pressure stay predictable. Defaults for concurrency and per-image
timeout live in roles/defaults/defaults/main.yml.

Wire the prepull playbook into the orchestrator via a new dedicated
phase in pkg/dag, and include it from every molecule scenario's
converge.yml so CI exercises the same path as real deploys.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ricolin <rlin@vexxhost.com>
@ricolin Rico Lin (ricolin) force-pushed the copilot/simulate-pre-install-deps-early-ci branch from 7cdf113 to ac38b57 Compare April 23, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants