-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
When enabling parallel_plan: true and parallel_apply: true in atlantis.yaml, we are experiencing concurrency issues with Terraform provider installation. Multiple parallel executions try to write/read to the same shared plugin cache directory simultaneously, resulting in text file busy errors or checksum mismatches.
It seems that even when using a shared plugin cache, concurrent terraform init or terraform plan operations conflict when accessing the provider binaries.
Steps to Reproduce
- Enable parallel execution in
atlantis.yaml:parallel_plan: true parallel_apply: true
- Configure a shared plugin cache (e.g., via
TF_PLUGIN_CACHE_DIRenv var or.terraformrc). - Trigger a PR that runs multiple Terraform projects simultaneously (e.g., 5-10 projects) using the same providers.
Logs
│ Error: Failed to install provider
│
│ Error while installing hashicorp/azuread v3.7.0: open
│ /atlantis-data/plugin-cache/registry.terraform.io/hashicorp/azuread/3.7.0/linux_amd64/terraform-provider-azuread_v3.7.0_x5:
│ text file busy
And sometimes checksum errors:
│ Error: Required plugins are not installed
│
│ The installed provider plugins are not consistent with the packages
│ selected in the dependency lock file:
│ - registry.terraform.io/hashicorp/azurerm: the cached package for registry.terraform.io/hashicorp/azurerm 4.54.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file
Environment details
- Atlantis version: v0.37.1
- Terraform version: v1.13.5
- Atlantis server side config:
TF_PLUGIN_CACHE_DIRis set to a shared directory.
Workaround attempted
We had to implement a workaround in our atlantis.yaml to serialize the init phase and force a local download of providers (bypassing the cache) to avoid conflicts:
workflows:
default:
plan:
steps:
# Use flock to serialize init and disable cache to avoid symlink conflicts
- run: flock /tmp/terraform_init.lock bash -c "rm -rf .terraform/providers && env -u TF_PLUGIN_CACHE_DIR TF_CLI_CONFIG_FILE=/dev/null terraform init -upgrade"
- planProposed Solution / Feature Request
It would be great if Atlantis could handle the locking mechanism for the provider cache internally when parallel mode is enabled, or provide a native way to serialize the init step while keeping plan/apply parallel.