Skip to content

Ansible Playbook Performance Profiling and Optimization #409

@parithosh

Description

@parithosh

Background

During some OOO digging, discovered that Ansible's built-in profiling callbacks provide valuable insights into playbook performance bottlenecks.

Current Findings

Using ANSIBLE_CALLBACKS_ENABLED="profile_tasks,profile_roles" on a single bootnode reveals both role level and task level time breakdowns

e.g:
Role Execution Times:

  • lighthouse: 14.00s

Task-Level Breakdown:

  • ethpandaops.general.lighthouse : Run lighthouse container - 10.71s
  • ethpandaops.general.ethereum_node_fact_discovery : Get consensus node identity - 9.84s

Proposal

  1. Basic Profiling Campaign

    • Run profiling against 50+ nodes to establish statistical baseline (likely one of the latest devnets during first setup time)
    • Identify which tasks are consistently slow vs. variable
    • Distinguish between unfixable tasks (e.g., docker run commands) and optimizable ones
  2. Enhanced Profiling with ansible-runner

    ANSIBLE_CONFIG=./ansible.cfg ansible-runner run . -p playbook.yaml \
      --cmdline "--tags ethereum --limit bootnode" --artifact-dir ./artifacts
    This generates detailed artifacts with start/stop durations per task. We need a performance visualization tool that uses the files that are generated to showcase exactly the worst case tasks and roles. 
    

One or both approaches should give us enough info to figure out where we need to spend time optimising our ansible stack for 1k+ devnets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions