Add ability to run project updates on mesh nodes#16288
Add ability to run project updates on mesh nodes#16288dhageman wants to merge 4 commits intoansible:develfrom
Conversation
e2ecb27 to
0cbdbbc
Compare
📝 WalkthroughWalkthroughAdds project instance-groups API endpoints and permission checks; enables selecting remote execution nodes for project updates, tracks project/roles/collections artifact paths in callbacks, copies artifacts back after remote updates, and runs update playbooks in an isolated project_update workspace. Changes
Sequence DiagramsequenceDiagram
participant Client
participant API
participant Scheduler
participant Project
participant InstanceGroup
participant ExecNode
participant Controller
participant Callback
participant Artifacts
Client->>API: GET /projects/{id}/instance_groups/
API->>Project: load related instance_groups
Project->>InstanceGroup: query list
InstanceGroup-->>API: return serialized list
API-->>Client: respond
Client->>API: POST /projects/{id}/update/
API->>Scheduler: spawn_project_sync(project)
Scheduler->>InstanceGroup: list candidate nodes
InstanceGroup-->>Scheduler: candidates
Scheduler->>ExecNode: select_execution_node_by_capacity(candidates)
Scheduler->>Controller: dispatch update to ExecNode
Controller->>ExecNode: run playbook in isolated project_update_path
ExecNode->>Artifacts: produce project/roles/collections artifacts
ExecNode->>Callback: call artifacts_handler
Callback->>Artifacts: detect & record artifact paths
Controller->>Callback: post_run_hook copies artifacts back (if remote)
Controller-->>Client: update complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@awx/main/tasks/jobs.py`:
- Around line 1619-1638: In _copy_project_from_artifacts replace the non-atomic
remove-then-copy sequence: avoid calling shutil.rmtree(project_path) before
shutil.copytree; instead copy artifacts_project_path into a temporary location
(e.g., project_path + ".tmp" or use tempfile.mkdtemp), on successful
shutil.copytree to that temp path use os.replace/os.rename to atomically move
the temp into project_path (or rename the existing project_path to a backup name
and only rmtree the backup after a successful replace), and only then remove the
old/backup directory; this ensures failures in shutil.copytree leave the
original project_path intact and only removes it after a verified successful
replacement.
- Around line 842-864: In spawn_project_sync, when
project.instance_groups.exists() is true but
select_execution_node_by_capacity(candidate_instances) returns None (i.e., no
eligible execution nodes), add a warning log that includes the project
identifier and the project's instance_group identifier to indicate no eligible
execution nodes were found and that code will fall back to
self.instance.instance_group; reference select_execution_node_by_capacity,
project.instance_groups, project_ig, pu_ig, pu_en and
self.instance.instance_group so the warning is emitted in the branch where
select_execution_node_by_capacity yields None before assigning pu_ig/pu_en.
In `@awx/playbooks/project_update.yml`:
- Around line 50-69: The task sets project_update_path, ansible_local_temp,
ansible_roles_path, and ansible_collections_path by assuming
AWX_ISOLATED_DATA_DIR exists; add a guard that validates AWX_ISOLATED_DATA_DIR
before computing these facts (e.g., a preliminary task using
ansible.builtin.set_fact to fetch the env var and an ansible.builtin.fail if
it's undefined), and if the variable is shared across jobs namespace the
computed paths with a unique job identifier (e.g., use an existing job_id fact
or the AWX job env var) so project_update_path, ansible_local_temp,
ansible_roles_path, and ansible_collections_path are either rejected when
AWX_ISOLATED_DATA_DIR is missing or deterministically namespaced to avoid
cross-job clobbering.
0cbdbbc to
e6e022d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@awx/main/tasks/jobs.py`:
- Around line 1635-1687: In both _copy_project_from_artifacts and
_copy_requirements_from_artifacts guard against None artifact paths before
calling os.path.exists: for _copy_project_from_artifacts check that
self.runner_callback.project_artifact_path is not None (and a string/path) and
raise the same PostRunError if it is None or does not exist; in
_copy_requirements_from_artifacts ensure artifact_path from
getattr(self.runner_callback, attr, None) is not None before os.path.exists
(log/continue when None) so you never call os.path.exists(None) for
roles_artifact_path or collections_artifact_path.
- Around line 861-880: The code only checks project.instance_groups.first() so
it misses eligible nodes in other IGs; change the logic to consider all
project.instance_groups: build a candidate set by iterating
project.instance_groups and collecting enabled instances with node_type in
['execution','hybrid'] (or use a combined queryset), keep a mapping from each
instance to its instance group, call select_execution_node_by_capacity over that
combined candidate list to get execution_instance, and if found set pu_ig to the
mapped instance group and pu_en to execution_instance.hostname; otherwise keep
the existing fallback to self.instance.instance_group and cn. Ensure you still
handle the case where execution_instance.hostname == cn the same as today.
---
Duplicate comments:
In `@awx/main/tasks/jobs.py`:
- Around line 1647-1652: Replace the remove-then-copy pattern around
project_path/artifacts_project_path with an atomic swap: copy
artifacts_project_path into a temporary sibling directory (use tempfile.mkdtemp
or project_path + ".tmp") via shutil.copytree(..., symlinks=True), if successful
rename existing project_path to a backup name (e.g. project_path + ".bak") using
os.rename, rename the temp dir into project_path using os.rename, then remove
the backup with shutil.rmtree; ensure all operations are wrapped to clean up the
temp on failure and restore the backup if swapping fails, and keep the same
symbols (project_path, artifacts_project_path, shutil.copytree, os.path.exists,
shutil.rmtree) to locate the change.
In `@awx/playbooks/project_update.yml`:
- Around line 55-60: Validate that AWX_ISOLATED_DATA_DIR is set and non-empty
before composing the derived paths (project_update_path, ansible_local_temp,
ansible_roles_path, ansible_collections_path): either fail the play with
ansible.builtin.fail if the env var is missing, or derive a safe per-job
fallback (e.g. incorporate a unique job identifier like ansible_job_id) instead
of allowing bare "/project_update/" and "/tmp/"; update the set_fact that sets
project_update_path, ansible_local_temp, ansible_roles_path, and
ansible_collections_path to first check
lookup('ansible.builtin.env','AWX_ISOLATED_DATA_DIR') and use a validated value
or the safe fallback.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
awx/main/tasks/jobs.py (2)
1505-1506:self.private_data_diris stored but never read
self.private_data_dir = private_data_diris set inpre_run_hookbut is not referenced anywhere inRunProjectUpdate(neither inbuild_args, nor in the copy helpers). Consider removing it unless there are plans to use it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@awx/main/tasks/jobs.py` around lines 1505 - 1506, The attribute self.private_data_dir is assigned in RunProjectUpdate.pre_run_hook but never used elsewhere (not referenced by build_args or any copy helper); remove the unused assignment to avoid dead state by deleting the line "self.private_data_dir = private_data_dir" from pre_run_hook (or alternatively, if the intent was to use it, wire it into build_args/copy helper logic where the private data directory is needed); update any tests or callers expecting that attribute accordingly and ensure no other code references private_data_dir on RunProjectUpdate.
1644-1649: Exception not chained inPostRunErrorraise (B904)The same pattern applies at line 1674 in
_copy_requirements_from_artifacts. Sincetraceback.format_exc()is already captured intb, usingraise ... from None(orfrom exc) prevents a confusing implicit chained traceback without losing the formatted string.🔧 Suggested fix (both locations)
# _copy_project_from_artifacts - except Exception: + except Exception as exc: raise PostRunError( f'{instance.log_format} failed to copy project artifacts to {project_path}', status='error', tb=traceback.format_exc(), - ) + ) from exc # _copy_requirements_from_artifacts - except Exception: + except Exception as exc: raise PostRunError( f'{instance.log_format} failed to copy {dest_subdir} from remote execution artifacts', status='error', tb=traceback.format_exc(), - ) + ) from exc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@awx/main/tasks/jobs.py` around lines 1644 - 1649, The except blocks that currently do "except Exception:" then raise PostRunError(..., tb=traceback.format_exc()) should capture the original exception and explicitly chain it; change to "except Exception as exc:" and raise PostRunError(..., tb=traceback.format_exc()) from exc (apply this change in the block shown and the similar block in _copy_requirements_from_artifacts) so the original exception is preserved and the implicit confusing chained traceback is avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@awx/main/tasks/jobs.py`:
- Around line 858-872: spawn_project_sync currently silently falls back to
self.instance.instance_group when project.instance_groups.exists() is true but
select_execution_node_by_capacity returns no eligible remote instance (or only
instances with hostname == cn); add a warning log inside the branch that
iterates project.instance_groups to record that no remote execution node was
selected and that execution will fall back to self.instance.instance_group,
including context such as project identifier (project.pk or project.name), the
controller hostname variable cn, the instance groups inspected
(project.instance_groups.all() or their ids), and the fact that
select_execution_node_by_capacity returned no remote candidate; place this log
where pu_ig and pu_en are determined (alongside or after the loop that sets
pu_ig/pu_en) so operators can distinguish intentional local execution from
misconfiguration.
---
Nitpick comments:
In `@awx/main/tasks/jobs.py`:
- Around line 1505-1506: The attribute self.private_data_dir is assigned in
RunProjectUpdate.pre_run_hook but never used elsewhere (not referenced by
build_args or any copy helper); remove the unused assignment to avoid dead state
by deleting the line "self.private_data_dir = private_data_dir" from
pre_run_hook (or alternatively, if the intent was to use it, wire it into
build_args/copy helper logic where the private data directory is needed); update
any tests or callers expecting that attribute accordingly and ensure no other
code references private_data_dir on RunProjectUpdate.
- Around line 1644-1649: The except blocks that currently do "except Exception:"
then raise PostRunError(..., tb=traceback.format_exc()) should capture the
original exception and explicitly chain it; change to "except Exception as exc:"
and raise PostRunError(..., tb=traceback.format_exc()) from exc (apply this
change in the block shown and the similar block in
_copy_requirements_from_artifacts) so the original exception is preserved and
the implicit confusing chained traceback is avoided.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
awx/playbooks/project_update.yml (1)
66-70: Missingmodeon remote workspace directory creation.Other
ansible.builtin.filedirectory tasks in this playbook (lines 142, 171) explicitly setmode: '0755'. The remote workspace directory inherits the runner's umask instead.🔧 Proposed fix
- name: Create the directory in case we have a delete operation to avoid errors ansible.builtin.file: path: "{{ project_update_path }}" state: directory + mode: '0755' tags: - remote_update🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@awx/playbooks/project_update.yml` around lines 66 - 70, The directory creation task named "Create the directory in case we have a delete operation to avoid errors" uses ansible.builtin.file with path "{{ project_update_path }}" but omits an explicit mode so the remote directory inherits the runner's umask; update that task to include mode: '0755' (matching other file tasks in this playbook) to ensure consistent permissions for project_update_path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@awx/playbooks/project_update.yml`:
- Around line 52-71: This is a duplicate of the prior discussion about guarding
for AWX_ISOLATED_DATA_DIR; remove the redundant review comment and ensure the
playbook's set_fact usage (project_update_path, ansible_local_temp,
ansible_roles_path, ansible_collections_path in the "Update the default working
path for automation mesh" task) remains unchanged — if any guard is needed it
should be the one already agreed upon in the earlier review, so either reference
that existing resolution in the PR thread or delete this duplicate note.
---
Nitpick comments:
In `@awx/playbooks/project_update.yml`:
- Around line 66-70: The directory creation task named "Create the directory in
case we have a delete operation to avoid errors" uses ansible.builtin.file with
path "{{ project_update_path }}" but omits an explicit mode so the remote
directory inherits the runner's umask; update that task to include mode: '0755'
(matching other file tasks in this playbook) to ensure consistent permissions
for project_update_path.
|


SUMMARY
This pull request adds support to run project updates on automation mesh nodes. This aligns the functionality of project updates to other aspects of the platform.
Assisted-by: Claude Code (Claude Opus 4.6)
WHY
Automation mesh is the solution to handle complex networking and security controls. The current requirement of project updates running on the control plane often means jumping through additional hoops to meet business requirements. This feature can allow users to work with their current mesh networking to gain greater control over networking and security.
HOW
Project updates are done with a simple playbook. They are an Ansible automation job!
The goal of this work was to ensure the current workflow was left untouched as possible. As such, this builds on the current workflow to identify when a project is associated with an instance group. In that scenario, the instance group will be used to run the project update playbook. The artifacts of the playbook update are returned to the control plane and post hook processing of the project update job moves the project data into the appropriate place. Projects without an association with an instance group will continue to use the control plane for update operations.
CAVEATS
Performing project updates on the automation mesh is slower than a control update. You can't beat physics! The data is being transferred over the automation mesh and eventually needs to be processed on the control plane.
ISSUE TYPE
COMPONENT NAME
ADDITIONAL INFORMATION
Start by associating an instance group with a project. This is done using an API call that aligns with current API patterns.
POST /api/v2/projects/{project_id}/instance_groups/The payload should be use the same format for managing instance groups with inventories, organizations, etc.
Project updates whether directly initiated or triggered by a playbook should then run on the instance group.
Summary by CodeRabbit
New Features
Chore