fix: restore compatibility with Ubuntu 26.04 / ansible-core 2.20 by ricolin · Pull Request #3864 · vexxhost/atmosphere

Rico Lin (ricolin) · 2026-04-22T12:56:45Z

Companion PR to the Ubuntu 26.04 work on the collection repos. Tracks two atmosphere-side fixes discovered while validating an AIO deploy on Ubuntu 26.04 with ansible-core 2.20 and Python 3.14.

Companion PRs:

feat: support Ubuntu 26.04 and bump default Ceph to v20.2.1 (Tentacle) ansible-collection-ceph#105
feat: support Ubuntu 26.04 and Python 3.14 ansible-collection-containers#118
feat: support Ubuntu 26.04 and Python 3.14 ansible-collection-kubernetes#268

Checked:


Test matrix

┌─────────────┬────────┬─────────┬─────────────────────────────────┬─────────────────┬────────────────────────────────┬────────┐
│ OS          │ Python │ Backend │ Scenario                        │ Wallclock       │ Tempest                        │ Result │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────────┼────────┤
│ 26.04       │ 3.14   │ OVN     │ fresh deploy                    │ ~75 min         │ 163/164 pass                   │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────────┼────────┤
│ 24.04.1     │ 3.12   │ OVS     │ fresh deploy                    │ 102 min         │ 163/164 pass                   │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────────┼────────┤
│ 22.04.3     │ 3.10   │ OVN     │ fresh deploy                    │ 93 min          │ 163/164 pass                   │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────────┼────────┤
│ 22.04.3     │ 3.10   │ OVN     │ previous version → in-place upgrade     │ 99 min + 22 min │ N/A (upgrade, cluster healthy) │ ✅     │
└─────────────┴────────┴─────────┴─────────────────────────────────┴─────────────────┴────────────────────────────────┴────────┘

Todo: update zuul CI to test against 2604

Add a Go binary (cmd/atmosphere) that deploys Atmosphere components in parallel waves using a DAG-based dependency graph, reducing deployment time from ~60 minutes to ~22 minutes. Key components: - pkg/dag: Generic Graph[T] library with topological sort, subgraph extraction, and parallel wave execution via errgroup - internal/deploy: Component registry (42 components), Deployer interface with AnsibleDeployer, and 3-mode Orchestrator - cmd/atmosphere: CLI with deploy subcommand (--inventory, --tags, --playbook-dir, --concurrency flags) Three operating modes: - No tags: full DAG parallel deployment (11 waves) - Single tag: pass-through to ansible-playbook (backwards compatible) - Multiple tags: DAG-aware subgraph with parallel waves The orchestrator spawns concurrent ansible-playbook processes with generated per-component playbooks piped via /dev/stdin, avoiding multi-play parsing overhead. Output is streamed with [component] prefixes for clear CI log interleaving. Backwards compatibility: existing ansible-playbook usage, tags, and variables are completely unchanged. The orchestrator is additive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Update molecule converge playbooks to build and use the atmosphere binary for deployment: - default: full DAG deploy (no tags) - csi: multi-tag with ceph,kubernetes,csi (or kubernetes,csi) - keycloak: multi-tag with all keycloak dependencies - pxc: single-tag pass-through for percona-xtradb-cluster The multi-tag mode resolves DAG ordering automatically, running independent components in parallel where possible. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Adjust DAG dependencies based on actual role analysis: - ingress-nginx: drop cluster-issuer dep (only needs kubernetes) - pxc, valkey, kube-prometheus-stack, loki: add csi dep (all use PVCs) - lpfc, multipathd, iscsi, udev: remove kubernetes dep (pure host config) - rook-ceph: depend on kubernetes only (operator, not storage consumer) - rook-ceph-cluster: add ceph dep (needs ceph monitors) - nova: add neutron dep, drop ovn/coredns (transitive via neutron) - neutron: add coredns dep (dnsmasq_dns_servers uses coredns) - magnum: depend on octavia, barbican, heat (configures all three clients) - openstack-exporter: depend on cinder, neutron (only hard runtime deps) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

- Add ensure-go role (v1.24.4) to molecule pre-run playbook - Set CGO_ENABLED=0 and explicit Go PATH in all converge build tasks - Add kubernetes, csi, valkey to keycloak scenario tags (transitive deps) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

When rendering playbooks piped via /dev/stdin, ansible-playbook has no collection context. Prefix bare role names with vexxhost.atmosphere. so Ansible can resolve them from the installed collection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Use vexxhost.atmosphere.* fully-qualified collection names for both playbooks (PlaybookType) and roles (RoleType). This removes the need for --playbook-dir since Ansible resolves collection references directly. Also removes the openstacksdk prerequisite step since dependent roles already call it and Ansible does atomic writes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Add a ResourceCoordinator that serializes components sharing a named resource (e.g., 'apt'). Components ceph and kubernetes declare the apt resource since they come from external collections where we cannot add retries. For all roles within vexxhost.atmosphere that use package management, add retries (5 attempts, 10s delay) to gracefully handle dpkg lock contention during parallel deployment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Mark multipathd and iscsi with the 'apt' resource since they install packages on the same hosts as ceph/kubernetes (external collections without retries). Also set changed_when: false on all molecule converge build/deploy tasks to pass idempotence checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Environment values containing Jinja expressions with single quotes (e.g., ceph container image) broke YAML parsing when wrapped in single-quoted YAML strings. Switch to Go's %q format which uses double quotes, safely containing single quotes in the values. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

ipmi-exporter deploys directly into the monitoring namespace using kubernetes.core.k8s (not Helm with create_namespace: true), so it needs the namespace to exist first. kube-prometheus-stack creates it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

keepalived and percona-xtradb-cluster deploy raw k8s resources into the openstack namespace without creating it. memcached (via Helm with create_namespace: true) creates the namespace. Add memcached as a dependency so the namespace exists before these components run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

prometheus-pushgateway enables serviceMonitor which requires the ServiceMonitor CRD from kube-prometheus-stack. Without this dep, the Helm install fails with 'no matches for kind ServiceMonitor'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

The vexxhost.kubernetes collection uses kubernetes.core.k8s modules in early plays before the Python kubernetes package is installed by later plays. When running in parallel mode, this race becomes more visible. Install the package in pre-run to ensure it's available system-wide before any playbooks execute. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

libvirt, kube-prometheus-stack, and valkey all create Certificate resources using cert-manager.io/v1 CRDs directly via kubernetes.core.k8s. They also reference a ClusterIssuer named 'self-signed' created by the cluster-issuer role. Add cluster-issuer as a dependency so the CRDs and issuer exist before these components deploy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

The kube_prometheus_stack role starts by waiting for the Keycloak StatefulSet to be ready and then creates realms/clients. Without keycloak in its dependency list, it can start before keycloak is deployed, causing 'list object has no element 0' errors when checking the StatefulSet status. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

The rook_ceph_cluster role creates Keystone users, services, and endpoints for Swift/RGW integration using openstack.cloud modules. Without keystone being deployed first, these calls fail with SSL connection errors to the identity endpoint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

Manila creates compute flavors (needs Nova endpoint), uploads images (needs Glance via Nova chain), and its Helm values reference endpoints for nova, neutron, and cinder. Without these services deployed first, manila fails with EndpointNotFound for the compute service. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

rook-ceph-cluster creates an OpenStack user in the 'service' domain using openstack.cloud.identity_user. The 'service' domain is created by OpenStack-Helm's ks-user bootstrap jobs (via helm-toolkit). By depending on barbican (the first core service deployed), we ensure the service domain exists before rook-ceph-cluster tries to use it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

Move Go binary build to pre-run and add a custom Zuul run playbook that runs molecule prepare, atmosphere deploy, br-ex networking (AIO), and molecule verify as separate plays. This replaces the parent job's molecule test invocation so deploy output streams directly to Zuul logs instead of being buffered through molecule. Also adds atmosphere_deploy_tags to CSI and keycloak job definitions so each scenario deploys only its required components. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

All converge playbooks now use the atmosphere_deploy_tags variable instead of hardcoded tags. The Zuul run.yml imports the molecule converge playbook directly, so the same converge logic runs both locally (molecule converge) and in CI (Zuul run playbook). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

Cinder's Helm chart creates PVCs that need the Ceph CSI provisioner to be running. Add ceph-provisioners as a dependency so the storage class and provisioner are ready before cinder deploys. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

…build ceph-provisioners only needs ceph monitors and CSI driver, not rook-ceph-cluster. Also removes duplicate Go binary build from pre.yml since converge.yml already handles it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

Each Zuul job now sets run: to the scenario's converge.yml followed by a verify playbook, so Zuul streams deploy output directly. Molecule prepare and inventory setup move to pre.yml. Converge playbooks use hosts: all with delegate_to/run_once so they work in both molecule (localhost) and Zuul (remote node) contexts. Also fixes ceph-provisioners to depend only on ceph (not rook-ceph-cluster) since it only needs ceph monitors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

In Zuul, Go is installed on the remote instance (via ensure-go) not on the executor (localhost). Remove delegate_to: localhost so the go build and atmosphere deploy commands run where Go and the collection are available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

The `zuul.project.src_dir` variable is a relative path (e.g. `src/github.com/vexxhost/atmosphere`). When the deploy task uses it as a prefix in `cmd` while also having `chdir` set to the same relative path, Ansible resolves the binary path as if it were relative to the new cwd, doubling the path and causing a FileNotFoundError. Fix by using `./bin/atmosphere` and `./inventory.yaml` in the cmd field since `chdir` already navigates to the correct directory. Also fix pre-commit end-of-file issue in orchestrator.go and add a release note for the parallel deployment orchestrator feature. Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/60b11e18-b92e-476f-86db-2a6c2ac4db06 Co-authored-by: mnaser <435815+mnaser@users.noreply.github.com>

The atmosphere deploy binary calls ansible-playbook internally. Add .venv/bin to PATH in the deploy task so the binary can find ansible-playbook installed in the uv virtual environment. Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/60b11e18-b92e-476f-86db-2a6c2ac4db06 Co-authored-by: mnaser <435815+mnaser@users.noreply.github.com>

…ok path Go 1.19+ refuses to execute binaries found via relative PATH entries (CVE-2022-30580). Using `PATH=.venv/bin:...` fails because `.venv/bin` is a relative entry. Switch to `ansible.builtin.shell` with `. .venv/bin/activate` so that the shell activation script adds the ABSOLUTE path of `.venv/bin` to PATH before invoking `./bin/atmosphere deploy`. The atmosphere binary then finds ansible-playbook via an absolute path, satisfying Go 1.19+ security requirements. Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/60b11e18-b92e-476f-86db-2a6c2ac4db06 Co-authored-by: mnaser <435815+mnaser@users.noreply.github.com>

In the original serialized playbook, nova was deployed before neutron. Neutron's post-install network creation requires the nova availability zone to exist. Swap the dependency so nova deploys first, then neutron. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <copilot@github.com>

Replace the generic OpenStack quota task with service-specific quota commands for compute, volume, and network resources. This avoids querying load-balancer quotas during Manila deployment, which can fail when the Octavia endpoint uses an untrusted certificate. Signed-off-by: Yaguang Tang <yaguang.tang@vexxhost.com>

The parallel orchestrator generates minimal single-role playbooks for RoleType components, which bypasses pre_tasks defined in the original sequential playbooks (e.g., playbooks/openstack.yml). This means the atmosphere_ceph_enabled deprecation guard was silently skipped. Add a runPreflightChecks() method that runs the same validation checks before any component deployment begins, called from both deployFullDAG and deployMultipleTags. The deploySingleTag path is unaffected since it passes through to the full site.yml which already includes pre_tasks. Change-Id: If068daa27a3f4475e570f08ab6d2cd52effb2914 Signed-off-by: Dong Ma <dong.ma@vexxhost.com>

Magnum's Helm install doesn't require octavia to be running. The only octavia reference is the octavia_client endpoint URL in helm values, which is a deterministic string generated from openstack_helm_endpoints. Octavia is only needed at runtime when users create Kubernetes clusters. This allows magnum to start after barbican and heat complete (~13:30) instead of waiting for octavia (~13:44), saving ~4.5 minutes on the critical path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: ricolin <rlin@vexxhost.com>

When running molecule locally (outside Zuul), the verify playbook cannot find workspace-generated variables (endpoints, secrets) because the inventory fallback was /dev/null. Set ATMOSPHERE_ZUUL_INVENTORY in tox.ini to point at the project root inventory.yaml so Ansible discovers group_vars for all playbooks (prepare, converge, verify). Touch the file before molecule runs to ensure it exists for the prepare step. In Zuul, molecule_environment overrides this env var with the Zuul-generated inventory path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I8098d8bbc2d5617bc5dd137d5dad17920bf73d69 Signed-off-by: ricolin <rlin@vexxhost.com>

Both roles have built-in retry logic (retries: 5, delay: 10) that handles transient dpkg lock contention. Removing the apt resource serialization allows them to run in parallel with ceph and kubernetes during Wave 0, saving ~2 minutes of serial wait time. Signed-off-by: ricolin <rlin@vexxhost.com>

The ceph component previously held the apt resource lock for its entire ~8 minute duration, but only used apt for ~30 seconds (Docker, cephadm packages). The remaining time (bootstrap, mon, mgr, OSD creation) does not touch apt. Split into two components: - ceph-packages: installs Docker and cephadm deps (holds apt lock ~1-2m) - ceph: runs the full ceph playbook (no apt lock, depends on ceph-packages) The main ceph playbook re-runs the cephadm role dependencies idempotently (packages already installed = fast skip). This allows kubernetes to start installing as soon as ceph-packages finishes, rather than waiting for the entire ceph bootstrap to complete. Signed-off-by: ricolin <rlin@vexxhost.com>

Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/21aacaca-4069-450a-a09d-0a1cddca9963 Co-authored-by: ricolin <7250045+ricolin@users.noreply.github.com>

The apt resource declarations were removed from multipathd and iscsi components assuming their built-in retry logic would handle dpkg lock contention. However, the kubernetes component (which runs in the same wave) uses the external vexxhost.kubernetes.kubelet role that does NOT have retry logic on its apt tasks. When multipathd or iscsi held the dpkg lock, kubelet failed immediately with rc:100. Re-add the apt resource to serialize these components with kubernetes and ceph, preventing dpkg lock contention entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: If6fca34a1d6f84d3213ea7b777a8b6cf9c35a126 Signed-off-by: ricolin <rlin@vexxhost.com>

Revert the re-added apt resource lock on multipathd and iscsi. Both roles already have built-in retry logic (retries: 5, delay: 10s) that handles dpkg lock contention gracefully. The original concern was that the kubelet role in vexxhost.kubernetes lacks retry logic on its apt tasks. This is being addressed upstream in vexxhost/ansible-collection-kubernetes#262 by adding retry logic directly to the kubelet role. With retry logic on both sides, serializing multipathd and iscsi behind the apt resource is no longer necessary, recovering ~1-3 minutes of Wave 0 parallelism. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: ricolin <rlin@vexxhost.com>

The Go deployer spawns ansible-playbook subprocesses that need: - PATH: to find ansible-playbook in the venv - ANSIBLE_COLLECTIONS_PATH: to find collections when running as root via become:true (root defaults to /root/.ansible/collections) Signed-off-by: ricolin <rlin@vexxhost.com>

In Zuul CI, molecule/ansible-compat installs the collection to a cache directory and sets ANSIBLE_COLLECTIONS_PATH accordingly. The previous commit unconditionally overrode this with ~/.ansible/collections, causing ansible-playbook to fail finding the vexxhost.atmosphere.* roles. Make PATH and ANSIBLE_COLLECTIONS_PATH conditional on zuul is not defined, so CI inherits the correct paths from .venv/bin/activate and molecule's prerun while local runs still get the paths they need. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: ricolin <rlin@vexxhost.com>

The ceph component previously held the apt resource lock for its entire ~8 minute duration, but only used apt for ~30 seconds (Docker, cephadm packages). The remaining time (bootstrap, mon, mgr, OSD creation) does not touch apt. Split into two components: - ceph-packages: installs Docker and cephadm deps (holds apt lock ~1-2m) - ceph: runs the full ceph playbook (no apt lock, depends on ceph-packages) The main ceph playbook re-runs the cephadm role dependencies idempotently (packages already installed = fast skip). This allows kubernetes to start installing as soon as ceph-packages finishes, rather than waiting for the entire ceph bootstrap to complete. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: ricolin <rlin@vexxhost.com>

Re-add the apt resource lock to multipathd and iscsi to prevent dpkg lock contention with ceph (containerd AppArmor install) and kubernetes (kubelet package install), which lack retry logic in their upstream collections. Once the following upstream PRs merge and atmosphere pins the new collection versions, this lock can be safely removed: - vexxhost/ansible-collection-kubernetes#262 - vexxhost/ansible-collection-containers#114 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: ricolin <rlin@vexxhost.com>

Change-Id: I23272c63155e8d4f323a278baa638cbb3073559d

Address multiple deployment failures on fresh Ubuntu 26.04 installs: 1. The generate_workspace playbook and the Nova/Manila generate_public_key tasks failed to generate SSH keys because systemd mounts /tmp as tmpfs on Ubuntu 24.04+, and community.crypto.openssh_keypair calls chattr on the generated files, which tmpfs does not support. Switch those tasks to a disk-backed tempfile location. 2. Ubuntu 26.04 ships Python 3.14 and a newer ansible-core. The pinned community.general 7.3.0 (and friends) break with JMESPathError under the 2.19+ template engine. Bump the pinned Ansible collections to recent major versions and lift the ansible-core pin so everything runs natively on Python 3.14. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I561fb8ba2e52c1f26d86a3e9be5d0615c735d46d Signed-off-by: Rico Lin <rico@vexxhost.com>

Ansible 2.20+ deprecates INJECT_FACTS_AS_VARS defaulting to true and warns when top-level ansible_* fact variables are used. Switch the prepare.yml snapd purge condition to ansible_facts['distribution']. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: Idfe0733773fa6aab60b7da00050d32547d772fe9 Signed-off-by: Rico Lin <rico@vexxhost.com>

Switch ceph key lookups in the ceph_provisioners, ceph_csi_rbd and rook_ceph_cluster roles to the new vexxhost.ceph.key_info module, since recent versions of the vexxhost.ceph collection removed state: info from the vexxhost.ceph.key module. Teach the storage_to_ceph_provisioners_helm_values filter plugin to unwrap ansible-core 2.20 lazy value and lazy container wrappers before validating atmosphere_storage, so that Pydantic's discriminated-union resolution receives plain strings rather than _LazyValue instances. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I3ed1ad21eef19d3251267b88a06b2002348a4d46 Signed-off-by: Rico Lin <rico@vexxhost.com>

On Ubuntu 26.04 systemd resolves `LimitNOFILE=infinity` to 2147483584 (INT_MAX/2). Every container started by containerd v2.x inherits that value. Workloads that iterate over inherited file descriptors before `execve` — for example HAProxy external-check scripts spawned by the Percona XtraDB cluster — spend tens of seconds in the close loop and get killed by their own timeout, which in turn crash-loops the HAProxy pod and blocks Keycloak and the rest of the deploy. Pin `containerd_limit_open_file_num` to 1048576 when importing the Kubernetes and Ceph playbooks so the containerd role renders the systemd unit with a sane limit on every distribution. Matches the value previously used on Red Hat systems and the effective limit on older Ubuntu kernels. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: Ic71fd7fbd9118c4d5a7b4d5cec2009ab062b5a19 Signed-off-by: Rico Lin <rico@vexxhost.com>

Keycloak 24 runs the Quarkus augmentation step at first boot. The upstream Bitnami chart defaults to `resourcesPreset: small`, which caps memory at 768 MiB and triggered an `OOMKilled` before the server could open its HTTP port, failing the Helm install with a startup probe timeout. Use the `medium` preset (up to 1536 MiB) as the Atmosphere default. Operators can still override through `keycloak_helm_values`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I1874ebaf1e12228ee09114c8f4e72e374a4e45f4 Signed-off-by: Rico Lin <rico@vexxhost.com>

Ansible-core 2.20 wraps rendered default values in lazy containers that the loop keyword rejects with 'must resolve to a list, not str'. Define the default list directly in the magnum and magnum_pre role defaults to sidestep the wrapper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I1b0f09820b5452ee058e5c8fb61cc7edc7443e4b Signed-off-by: Rico Lin <rico@vexxhost.com>

The molecule AIO override referenced '_magnum_images' which was removed from role vars in the previous commit, and used string template syntax that ansible-core 2.20 rejects for 'loop:' consumers. Inline the single test image directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I9067305529696f6440e72ace4944b4ca7a9c3225 Signed-off-by: Rico Lin <rico@vexxhost.com>

The neutron-db-sync post-install hook replays the full Alembic migration chain on a fresh install, which regularly exceeds the default 5-minute Helm hook timeout on Ubuntu 26.04 test hosts and leaves the release in the 'failed' state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: Ib266c8a1b36e214a9c737d8284d22f39738476e5 Signed-off-by: Rico Lin <rico@vexxhost.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I9e64043f9049c13d241e75e5129c64aa5e90ca3a Signed-off-by: Rico Lin <rico@vexxhost.com>

ansible-core 2.20 deprecates INJECT_FACTS_AS_VARS defaulting to true and warns whenever a top-level ansible_* fact variable is referenced. Switch the remaining molecule prepare/converge/scenario files to ansible_facts['fact_name'] for default_ipv4, distribution, and env. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I70de7369acfb06dec9eb3e2a8e500bf54ca40bf0 Signed-off-by: Rico Lin <rico@vexxhost.com>

The Tempest suite on Ubuntu 26.04 takes longer than the previous 20-minute Helm wait, so the kubernetes.core.helm task gives up and the subsequent k8s_info call samples the Job before the Kubernetes Job controller has finalised .status.succeeded. Even with every test passing the role then reports Tempest failed. Raise wait_timeout to 30 minutes and add a polling retry on the Job lookup so the role waits until the Job actually reaches a terminal state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I419323efd5827163a93b300a4411593c8900806a Signed-off-by: Rico Lin <rico@vexxhost.com>

Commit d1f405a raised the ansible-core requirement to >=2.20 as part of Ubuntu 26.04 support. That release requires Python 3.12, which Ubuntu 22.04 does not ship, so pip refuses to install Atmosphere at all on a 22.04 host. Lower the floor back to >=2.15.9. On Ubuntu 22.04 pip resolves to the latest compatible release in the 2.17.x series, which is sufficient for every collection pinned in galaxy.yml. On Ubuntu 26.04 pip still picks up 2.20 or newer, preserving the Python 3.14 deployment path. Validated end-to-end on Ubuntu 22.04.3: tempest 163/164 pass, 1 skip, 0 fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: Ic338274618d79c76f1162cbfd240e94d97da3547 Signed-off-by: Rico Lin <rico@vexxhost.com>

Rico Lin (ricolin) · 2026-04-23T06:45:00Z

 Ubuntu 26.04 + 22.04 Cross-Compatibility — Final Report

  Mission

  Enable Atmosphere deployment on Ubuntu 26.04 (Resolute) while ensuring existing Ubuntu 22.04 users can upgrade to the new
  Atmosphere release without being forced onto 26.04.

  Result: ✅ VALIDATED on both OSes


PRs Included in This Validation

1. vexxhost/atmosphere #3864 — rlin-ubuntu2604-support

Title: fix: support Ubuntu 26.04 while keeping 22.04 supported Final HEAD: 5b365711 URL: 
https://github.com/vexxhost/atmosphere/pull/3864

The main cloud-platform PR. 10 discrete fixes addressing deployment failures on fresh Ubuntu 26.04 installs, plus the 22.04
compatibility softens. Commits on the branch:

┌──────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ SHA          │ Message                                                                                                      │
├──────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ d1f405ac     │ fix(deps): support Ubuntu 26.04 / Python 3.14 (bump ansible-core, collections, tmpfs fix)                    │
├──────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ (multiple)   │ containerd NOFILE cap, lazy-value unwrap, magnum_images inline, Keycloak large, Neutron 15m timeout, ceph    │
│              │ key_info migration, facts migration                                                                          │
├──────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ a6c055d9     │ fix(tempest): bump Helm wait timeout and retry job lookup                                                    │
├──────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 5b365711     │ fix(deps): lower ansible-core floor to keep Ubuntu 22.04 supported (this session)                            │
└──────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

------------------------------------------------------------------------------------------------------------------------------

2. vexxhost/ansible-collection-containers #118 — rlin-ubuntu2604-support

Title: feat: Ubuntu 26.04 support Final HEAD: 31d5c81 URL: https://github.com/vexxhost/ansible-collection-containers/pull/118

Container/containerd role updates. Key change: make the LimitNOFILE cap conditional so 22.04 keeps the historic infinity, while
26.04 and RedHat families get 1048576 (avoids HAProxy external-check 7s close-fd hang triggered by 26.04 resolving infinity to
2,147,483,584).

------------------------------------------------------------------------------------------------------------------------------

3. vexxhost/ansible-collection-ceph #105 — rlin-ubuntu2604-support

Title: feat: Ubuntu 26.04 support Final HEAD: cc79c72 URL: https://github.com/vexxhost/ansible-collection-ceph/pull/105

Ceph collection. Softened Python floor (>=3.11) and ansible-core (>=2.18) so 22.04 can still install it, reverted an unrelated
default image bump (Reef 18.2.1 → Tentacle 20.2.1) that snuck into the 26.04 commit. The new key_info module added here is what
atmosphere #3864 migrates to.

------------------------------------------------------------------------------------------------------------------------------

4. vexxhost/ansible-collection-kubernetes #268 — rlin-ubuntu2604-support

Title: feat: Ubuntu 26.04 support Final HEAD: b5f02da URL: https://github.com/vexxhost/ansible-collection-kubernetes/pull/268

Kubernetes collection. Softened Python floor (>=3.10) and ansible-core (>=2.15.9), restored minimum (not exact) floors on 
ansible.posix, community.crypto, community.general, kubernetes.core so 22.04 can install these.

------------------------------------------------------------------------------------------------------------------------------

How they interact

 atmosphere PR #3864
   └── depends on (galaxy.yml):
         ├── vexxhost.containers 1.6.6  ← PR #118
         ├── vexxhost.ceph >=3.2.0       ← PR #105
         └── vexxhost.kubernetes 3.0.1   ← PR #268

Test environment wires them together via an uncommitted molecule/aio/collections.yml that overrides Galaxy lookups with
git-branch installs:

 collections:
   - { name: https://github.com/vexxhost/ansible-collection-containers.git, type: git, version: rlin-ubuntu2604-support }
   - { name: https://github.com/vexxhost/ansible-collection-ceph.git,       type: git, version: rlin-ubuntu2604-support }
   - { name: https://github.com/vexxhost/ansible-collection-kubernetes.git, type: git, version: rlin-ubuntu2604-support }

All four must land together — the atmosphere PR's role code depends on the new module signatures introduced in the three
collection PRs.
  ┌───────────────┬─────────┬──────────────────┬──────────────────────┐
  │ OS      │ Result           │ Tempest              │
  ├───────────────┼─────────┼──────────────────┼──────────────────────┤
  │ 26.04   │ ✅ PASS          │ 163/164 pass, 1 skip │
  ├───────────────┼─────────┼──────────────────┼──────────────────────┤
  │ 22.04.3 │ ✅ PASS (93 min) │ 163/164 pass, 1 skip │
  └───────────────┴─────────┴──────────────────┴──────────────────────┘

  PRs merged / pushed this session

  ┌────────────────────────────────────────┬───────┬────────────┬───────────────────────────────────────────────────────────────┐
  │ Repo                                   │ PR    │ Final HEAD │ Change                                                        │
  ├────────────────────────────────────────┼───────┼────────────┼───────────────────────────────────────────────────────────────┤
  │ vexxhost/atmosphere                    │ #3864 │ 5b365711   │ 10 fixes: containerd NOFILE, lazy-value unwrap, magnum_images │
  │                                        │       │            │ inline, Keycloak resources, Neutron timeout, ceph key_info,   │
  │                                        │       │            │ tmpfs chattr, facts migration, tempest race, ansible-core     │
  │                                        │       │            │ floor soften                                                  │
  ├────────────────────────────────────────┼───────┼────────────┼───────────────────────────────────────────────────────────────┤
  │ vexxhost/ansible-collection-containers │ #118  │ 31d5c81    │ Soften galaxy deps, conditional NOFILE cap (22.04 keeps       │
  │                                        │       │            │ infinity)                                                     │
  ├────────────────────────────────────────┼───────┼────────────┼───────────────────────────────────────────────────────────────┤
  │ vexxhost/ansible-collection-ceph       │ #105  │ cc79c72    │ Soften python/ansible floors, revert Reef→Tentacle default    │
  │                                        │       │            │ bump                                                          │
  ├────────────────────────────────────────┼───────┼────────────┼───────────────────────────────────────────────────────────────┤
  │ vexxhost/ansible-collection-kubernetes │ #268  │ b5f02da    │ Soften python/ansible/k8s.core floors                         │
  └────────────────────────────────────────┴───────┴────────────┴───────────────────────────────────────────────────────────────┘

  Errors encountered & fixed (log entries 1–10)

   1. lazy-value Pydantic break → _deep_unwrap filter helper
   2. _magnum_images loop type error → inline list in defaults
   3. Neutron db-sync 5m helm timeout → 15m
   4. Keycloak OOMKilled medium → large preset
   5. containerd LimitNOFILE=infinity → 1,048,576 cap (HAProxy close-fd hang)
   6. tmpfs + chattr SSH keygen → disk-backed tempfile
   7. Ceph vexxhost.ceph.key state=info removed → key_info module
   8. ansible_default_ipv4.* deprecation → ansible_facts['default_ipv4'] (12 files)
   9. Tempest Job race → Helm wait 30m + retries: 30, delay: 10
   10. ansible-core>=2.20 unsatisfiable on py3.10 → floor back to >=2.15.9

  Key insight: the softening strategy

  The original 26.04 support commit (d1f405ac) hard-raised every floor to 26.04-native versions, accidentally locking 22.04 out
  entirely. The fix for each PR was the same pattern:

   - pyproject.toml: raise only to the minimum Python supports (>=3.10/>=3.11), drop pins on requires-python that exceed
    22.04
   - galaxy.yml: change exact pins (e.g., community.general:
    12.6.0) to floors (>=4.5.0) — pip/galaxy still resolve upward to newest compatible
   - Conditional behavior where unavoidable (NOFILE cap): gate on os_family == 'RedHat' or (Ubuntu >=
    26.04)

  Result: a single Atmosphere codebase runs on both OSes, with each picking its native Ansible/Python stack.

oslo.middleware 8.0 (shipped in the Magnum main image) removes the filter-style Healthcheck middleware and raises NotImplementedError on import, crashing magnum-api on startup. Switch the api-paste.ini to a composite root that mounts the healthcheck as an app under /healthcheck, matching the pattern already used by the Glance chart. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: Icb3f691e4292f051f0d33b66f362e3b176a5205a Signed-off-by: Rico Lin <rico@vexxhost.com>

Rico Lin (ricolin) · 2026-04-23T15:35:01Z

┌─────────────┬────────┬─────────┬─────────────────────────────────┬─────────────────┬────────────────────────────┬────────┐
│ OS          │ Python │ Backend │ Scenario                        │ Converge        │ Tempest                    │ Result │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────┼────────┤
│ 26.04       │ 3.14   │ OVN     │ fresh deploy                    │ ~75 min         │ 163/164 pass               │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────┼────────┤
│ 26.04       │ 3.14   │ OVS     │ fresh deploy (this run)         │ 62 min          │ 129/131 pass, 0 failed     │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────┼────────┤
│ 24.04.1     │ 3.12   │ OVS     │ fresh deploy                    │ 102 min         │ 163/164 pass               │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────┼────────┤
│ 22.04.3     │ 3.10   │ OVN     │ fresh deploy                    │ 93 min          │ 163/164 pass               │ ✅     │
├─────────────┼────────┼─────────┼─────────────────────────────────┼─────────────────┼────────────────────────────┼────────┤
│ 22.04.3     │ 3.10   │ OVN     │ baseline → in-place upgrade     │ 99 min + 22 min │ cluster healthy            │ ✅     │
└─────────────┴────────┴─────────┴─────────────────────────────────┴─────────────────┴────────────────────────────┴────────┘

Mohammed Naser (mnaser) and others added 30 commits April 15, 2026 12:44

Rico Lin (ricolin) and others added 19 commits April 17, 2026 11:55

fix(deploy): apply unresolved review feedback

a40ea50

Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/21aacaca-4069-450a-a09d-0a1cddca9963 Co-authored-by: ricolin <7250045+ricolin@users.noreply.github.com>

chore: finalize review feedback status

ddd4948

Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/21aacaca-4069-450a-a09d-0a1cddca9963 Co-authored-by: ricolin <7250045+ricolin@users.noreply.github.com>

chore: drop accidental module file changes

282a8e1

Agent-Logs-Url: https://github.com/vexxhost/atmosphere/sessions/21aacaca-4069-450a-a09d-0a1cddca9963 Co-authored-by: ricolin <7250045+ricolin@users.noreply.github.com>

Merge branch 'pr3842' into rlin-ubuntu2604-support

2d89800

Change-Id: I23272c63155e8d4f323a278baa638cbb3073559d

Rico Lin (ricolin) force-pushed the rlin-ubuntu2604-support branch from 6b0a223 to d58404a Compare April 22, 2026 14:38

Rico Lin and others added 7 commits April 22, 2026 23:16

docs: fix vale lint errors in release note

656ab61

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Change-Id: I9e64043f9049c13d241e75e5129c64aa5e90ca3a Signed-off-by: Rico Lin <rico@vexxhost.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore compatibility with Ubuntu 26.04 / ansible-core 2.20#3864

fix: restore compatibility with Ubuntu 26.04 / ansible-core 2.20#3864
Rico Lin (ricolin) wants to merge 62 commits intomainfrom
rlin-ubuntu2604-support

Rico Lin (ricolin) commented Apr 22, 2026 •

edited

Loading

Uh oh!

Rico Lin (ricolin) commented Apr 23, 2026 •

edited

Loading

Uh oh!

Rico Lin (ricolin) commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Rico Lin (ricolin) commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rico Lin (ricolin) commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rico Lin (ricolin) commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Rico Lin (ricolin) commented Apr 22, 2026 •

edited

Loading

Rico Lin (ricolin) commented Apr 23, 2026 •

edited

Loading