Skip to content

Race condition on provider installation with parallel_plan/apply enabled (Text file busy) #5984

@uroja97

Description

@uroja97

Description

When enabling parallel_plan: true and parallel_apply: true in atlantis.yaml, we are experiencing concurrency issues with Terraform provider installation. Multiple parallel executions try to write/read to the same shared plugin cache directory simultaneously, resulting in text file busy errors or checksum mismatches.

It seems that even when using a shared plugin cache, concurrent terraform init or terraform plan operations conflict when accessing the provider binaries.

Steps to Reproduce

  1. Enable parallel execution in atlantis.yaml:
    parallel_plan: true
    parallel_apply: true
  2. Configure a shared plugin cache (e.g., via TF_PLUGIN_CACHE_DIR env var or .terraformrc).
  3. Trigger a PR that runs multiple Terraform projects simultaneously (e.g., 5-10 projects) using the same providers.

Logs

│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/azuread v3.7.0: open
│ /atlantis-data/plugin-cache/registry.terraform.io/hashicorp/azuread/3.7.0/linux_amd64/terraform-provider-azuread_v3.7.0_x5:
│ text file busy

And sometimes checksum errors:

│ Error: Required plugins are not installed
│ 
│ The installed provider plugins are not consistent with the packages
│ selected in the dependency lock file:
│   - registry.terraform.io/hashicorp/azurerm: the cached package for registry.terraform.io/hashicorp/azurerm 4.54.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file

Environment details

  • Atlantis version: v0.37.1
  • Terraform version: v1.13.5
  • Atlantis server side config:
    • TF_PLUGIN_CACHE_DIR is set to a shared directory.

Workaround attempted

We had to implement a workaround in our atlantis.yaml to serialize the init phase and force a local download of providers (bypassing the cache) to avoid conflicts:

workflows:
  default:
    plan:
      steps:
        # Use flock to serialize init and disable cache to avoid symlink conflicts
        - run: flock /tmp/terraform_init.lock bash -c "rm -rf .terraform/providers && env -u TF_PLUGIN_CACHE_DIR TF_CLI_CONFIG_FILE=/dev/null terraform init -upgrade"
        - plan

Proposed Solution / Feature Request

It would be great if Atlantis could handle the locking mechanism for the provider cache internally when parallel mode is enabled, or provide a native way to serialize the init step while keeping plan/apply parallel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions