Skip to content

ci: persist build cache across runs, parallelize release upload#28561

Open
alii wants to merge 1 commit intomainfrom
claude/ci-cache-and-upload-parallelization
Open

ci: persist build cache across runs, parallelize release upload#28561
alii wants to merge 1 commit intomainfrom
claude/ci-cache-and-upload-parallelization

Conversation

@alii
Copy link
Member

@alii alii commented Mar 25, 2026

Summary

Two independent CI speed fixes:

1. Persist build cache outside the git checkout

The build cache (ccache, zig cache, WebKit/dep tarballs) defaults to ${buildDir}/cache, which is inside the git checkout. Buildkite runs git clean -ffxdq between builds, wiping it every time — so the caching infrastructure exists but has a 0% hit rate in CI.

  • scripts/build/config.ts — read BUN_BUILD_CACHE_PATH env var as the cacheDir default (falls back to ${buildDir}/cache when unset, so local builds are unchanged)
  • scripts/build/ci.ts — skip the post-build rmSync(cacheDir) disk cleanup when cacheDir is outside buildDir
  • .buildkite/Dockerfile + scripts/bootstrap.sh — export BUN_BUILD_CACHE_PATH=/var/lib/buildkite-agent/cache/build in the buildkite environment hook and create the directory

This should turn cold ccache/zig-cache into warm on persistent agents. The --cache-dir CLI flag still takes precedence for manual overrides.

2. Parallelize release artifact upload

upload-release.sh processed 22 zips sequentially — each iteration downloads the artifact, then runs three parallel uploads (S3 commit path, S3 tag path, GitHub release), then waits before moving to the next. The inner uploads were already parallel; the outer loop wasn't.

Now runs up to 6 artifacts concurrently with a bounded job pool (wait -n semaphore). Bounded to avoid hammering the GitHub API and keep peak disk usage reasonable. wait -n + set -e propagates failures correctly — if any upload fails, the step fails.

Not in this PR

Left CCACHE_SLOPPINESS alone — the !cfg.ci guard exists because Ubuntu 20.04 ships ccache 3.7.7 and some sloppiness values (gcno_cwd) need 4.5+. That needs version detection or a base image bump, separate PR.

Test plan

  • bash -n on both shell scripts
  • TypeScript imports verified (bun -e 'await import(...)')
  • Verified BUN_BUILD_CACHE_PATH=/tmp/test-cache is picked up by resolveConfig and the startsWith(buildDir + sep) check correctly returns false
  • Verified unset env var preserves existing behavior (cacheDir = ${buildDir}/cache)
  • Verified wait -n + set -e propagates background job failures
  • CI: observe ccache hit rate on second build of same branch
  • CI: observe release step duration on main

The build cache (ccache, zig cache, WebKit/dep tarballs) lives at
${buildDir}/cache by default, which is inside the git checkout. Buildkite
runs `git clean -ffxdq` between builds, so every build starts cold —
the caching infrastructure exists but has a 0% hit rate in CI.

- config.ts: read BUN_BUILD_CACHE_PATH env var as cacheDir default
- ci.ts: don't rm cacheDir during disk cleanup when it's outside buildDir
- Dockerfile + bootstrap.sh: set BUN_BUILD_CACHE_PATH in the buildkite
  environment hook, pointing at /var/lib/buildkite-agent/cache/build

Separately, upload-release.sh processed 22 artifacts sequentially
(download + S3×2 + GitHub upload per artifact). The inner uploads were
already parallel; the outer loop wasn't. Now runs up to 6 artifacts
concurrently with a bounded job pool. `wait -n` + `set -e` propagates
failures correctly.
@robobun
Copy link
Collaborator

robobun commented Mar 25, 2026

Updated 9:05 AM PT - Mar 25th, 2026

@alii, your commit 7fbae0e has 2 failures in Build #41935 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 28561

That installs a local version of the PR into your bun-28561 executable, so you can run:

bun-28561 --bun

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

These changes introduce Bun build cache support by establishing persistent cache directories in Buildkite infrastructure, configuring cache paths through environment variables, optimizing artifact uploads with bounded parallelism, and conditionally preserving external cache directories during cleanup.

Changes

Cohort / File(s) Summary
Buildkite Infrastructure Setup
.buildkite/Dockerfile, scripts/bootstrap.sh
Creates persistent cache directories (/var/lib/buildkite-agent/cache/build and ${home}/cache/build) and exports the BUN_BUILD_CACHE_PATH environment variable to enable Bun build caching in the Buildkite agent environment.
Build Cache Configuration
scripts/build/config.ts, scripts/build/ci.ts
Adds environment-driven cache directory resolution with BUN_BUILD_CACHE_PATH override support, and implements conditional cache cleanup that preserves external cache directories located outside the build directory.
Release Upload Optimization
.buildkite/scripts/upload-release.sh
Introduces concurrent artifact uploads with bounded parallelism (max 6 concurrent jobs) using background processes and wait -n, replacing sequential upload behavior.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the two main changes: persisting build cache and parallelizing release uploads.
Description check ✅ Passed The description comprehensively covers both features with context, implementation details, testing, and known limitations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/build/config.ts`:
- Around line 416-418: The env-driven cacheDir resolution currently treats empty
or relative BUN_BUILD_CACHE_PATH as cwd which diverges from partial.cacheDir;
update the logic that computes cacheDir in scripts/build/config.ts to (1) treat
an empty string as "not set", (2) if BUN_BUILD_CACHE_PATH is set and absolute
use it, otherwise resolve it relative to the repo build root (the same base used
for partial.cacheDir, e.g. buildDir) and (3) fallback to resolve(buildDir,
"cache") when unset — adjust the branch that references
process.env.BUN_BUILD_CACHE_PATH to implement these checks so cacheDir behavior
matches partial.cacheDir semantics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfb3fccb-50c8-4a8b-94ac-3d1d1a3d504b

📥 Commits

Reviewing files that changed from the base of the PR and between 1879e9e and 7fbae0e.

📒 Files selected for processing (5)
  • .buildkite/Dockerfile
  • .buildkite/scripts/upload-release.sh
  • scripts/bootstrap.sh
  • scripts/build/ci.ts
  • scripts/build/config.ts

Comment on lines +416 to +418
: process.env.BUN_BUILD_CACHE_PATH !== undefined
? resolve(process.env.BUN_BUILD_CACHE_PATH)
: resolve(buildDir, "cache");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

BUN_BUILD_CACHE_PATH resolution is inconsistent for relative/empty values.

At Line 416, env-driven cacheDir can resolve relative to process cwd (not repo root), and an empty value resolves to cwd. This diverges from partial.cacheDir semantics and can place cache unexpectedly.

🔧 Proposed fix
+  const envCacheDir = process.env.BUN_BUILD_CACHE_PATH?.trim();
   const cacheDir =
     partial.cacheDir !== undefined
       ? isAbsolute(partial.cacheDir)
         ? partial.cacheDir
         : resolve(cwd, partial.cacheDir)
-      : process.env.BUN_BUILD_CACHE_PATH !== undefined
-        ? resolve(process.env.BUN_BUILD_CACHE_PATH)
+      : envCacheDir
+        ? isAbsolute(envCacheDir)
+          ? envCacheDir
+          : resolve(cwd, envCacheDir)
         : resolve(buildDir, "cache");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/build/config.ts` around lines 416 - 418, The env-driven cacheDir
resolution currently treats empty or relative BUN_BUILD_CACHE_PATH as cwd which
diverges from partial.cacheDir; update the logic that computes cacheDir in
scripts/build/config.ts to (1) treat an empty string as "not set", (2) if
BUN_BUILD_CACHE_PATH is set and absolute use it, otherwise resolve it relative
to the repo build root (the same base used for partial.cacheDir, e.g. buildDir)
and (3) fallback to resolve(buildDir, "cache") when unset — adjust the branch
that references process.env.BUN_BUILD_CACHE_PATH to implement these checks so
cacheDir behavior matches partial.cacheDir semantics.

@github-actions
Copy link
Contributor

Found 5 issues this PR may fix:

  1. bun i with cache is much slower than with no cache #19936 - Build cache performance regression where caching makes installs 60x slower than no-cache
  2. bun install --frozen-lockfile --production unusually slow on Docker build #28278 - Docker build performance issue showing 20x slowdown (107s vs 3.3s) that could benefit from persistent caching
  3. Bun install gets hanging after caching dependencies #20886 - Cache-related hanging during dependency installation that persistent cache architecture could resolve
  4. canary archives are failing checksum verification #27830 - Canary release artifacts failing checksum verification that parallel upload optimization might improve
  5. bun install --frozen-lockfile is exceptionally slow in docker [bun 1.1.4] #10371 - Docker frozen-lockfile installs being exceptionally slow, addressable through persistent cache improvements

If this is helpful, consider adding 'Fixes #' to the PR description to auto-close the issue on merge.

🤖 Generated with Claude Code

Comment on lines +246 to 254
local max_jobs=6
for artifact in "${artifacts[@]}"; do
upload_artifact "$artifact"
while (( $(jobs -rp | wc -l) >= max_jobs )); do
wait -n
done
upload_artifact "$artifact" &
done
wait

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When any upload_artifact background job fails, wait -n returns non-zero and set -eo pipefail immediately exits the script, leaving up to 5 other in-flight upload_artifact background processes orphaned (bash does not kill background jobs on script exit). As a result, the outer wait and subsequent update_github_release/create_sentry_release/send_discord_announcement calls are never reached, leaving the release in an unpredictable partial state. Fix by adding trap to kill remaining background jobs on EXIT.

Extended reasoning...

What the bug is and how it manifests

The script has set -eo pipefail at line 3. The new semaphore loop (lines 246-254) launches upload_artifact as background jobs and uses wait -n inside the throttle loop to block when max_jobs=6 slots are all occupied. If any in-flight background job exits with a non-zero status (e.g. a transient AWS/GitHub network error), wait -n returns that non-zero exit code. Because set -e is active, the shell immediately exits at that point.

The specific code path that triggers it

The semaphore while-loop body calls wait -n to get a job slot. On failure, wait -n returns the failed job exit code. set -e immediately exits the script. Up to 5 other upload_artifact background jobs are still in-flight at this point. In non-interactive bash (the mode used by CI scripts), background jobs are NOT sent SIGHUP when the parent exits - they are reparented to PID 1 and continue running as orphans. The outer wait at line 253 is never reached, and neither are update_github_release/create_sentry_release/send_discord_announcement.

Why existing code does not prevent it

The PR description notes "wait -n + set -e propagates failures correctly" - and it does propagate the failure (the step fails). But it does not clean up. There is no trap on EXIT to kill remaining jobs. The orphaned processes continue uploading to S3/GitHub without the build script tracking them. The old sequential code did not have this problem because upload_artifact ran synchronously - the inner wait inside upload_artifact completed all inner jobs before the outer loop advanced. No background outer jobs existed to orphan.

Impact

After a single upload failure: (1) up to 5 other in-flight upload_artifact jobs keep writing artifacts to S3 and GitHub Release with no parent tracking their success or failure; (2) update_github_release is never called so canary release notes are not updated; (3) create_sentry_release is never called; (4) send_discord_announcement is never called. The release may end up partially uploaded with no build-step signal about which artifacts are missing. A re-run may double-upload some artifacts and miss others.

Step-by-step proof

  1. The loop starts; artifacts 1-6 are launched as background jobs, filling max_jobs=6 slots.
  2. Artifact 3 fails (aws s3 cp returns exit code 1 inside upload_artifact).
  3. The while condition re-evaluates: jobs -rp | wc -l is still >= 6, so wait -n is called.
  4. wait -n returns 1 (the exit code from artifact 3 job).
  5. set -e triggers: the script exits immediately.
  6. Artifacts 1, 2, 4, 5, 6 are still running their three parallel inner uploads (S3 commit path, S3 tag path, GitHub release) - now orphaned, reparented to PID 1.
  7. The outer wait, update_github_release, create_sentry_release, and send_discord_announcement are never reached.

How to fix it

Add a trap before the loop: trap kill jobs -rp 2 dev null wait EXIT. This ensures that on any exit all in-flight background jobs are killed and waited on, preventing orphans. Alternatively, collect failures manually with an error flag variable rather than relying on set -e to propagate them from wait -n.

Comment on lines +416 to +418
: process.env.BUN_BUILD_CACHE_PATH !== undefined
? resolve(process.env.BUN_BUILD_CACHE_PATH)
: resolve(buildDir, "cache");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In scripts/build/config.ts, the BUN_BUILD_CACHE_PATH check guards against undefined but not an empty string — if someone runs export BUN_BUILD_CACHE_PATH= in their shell, resolve("") returns process.cwd() (the repo root), silently writing all build caches there. Because the repo root is not inside buildDir, the startsWith(buildDir + sep) guard in ci.ts skips cleanup, leaving the cache permanently in the repo root. Fix: use a truthy check (process.env.BUN_BUILD_CACHE_PATH) instead of !== undefined.

Extended reasoning...

The condition at scripts/build/config.ts:416 reads:

process.env.BUN_BUILD_CACHE_PATH !== undefined
  ? resolve(process.env.BUN_BUILD_CACHE_PATH)
  : resolve(buildDir, "cache");

In POSIX shells, export BUN_BUILD_CACHE_PATH= sets the variable to an empty string. process.env.BUN_BUILD_CACHE_PATH !== undefined evaluates to true for an empty string, so the branch is taken. Node.js/Bun's path.resolve("") is equivalent to path.resolve() — it returns process.cwd(), which in this repository is the repo root (e.g., /home/claude/bun).

The consequence is that all build cache directories — ccache, zig-cache, and WebKit prebuilt tarballs — are created directly under the repository root instead of the intended subdirectory. The build succeeds, but artifacts end up in unexpected locations that are hard to notice.

A secondary effect compounds the issue. In scripts/build/ci.ts, the post-build cleanup guard is:

if (cfg.cacheDir.startsWith(cfg.buildDir + sep)) {
  rmSync(cfg.cacheDir, { recursive: true, force: true });
}

Since the repo root (resolved from BUN_BUILD_CACHE_PATH="") is a parent of buildDir, not a child, startsWith(buildDir + sep) is false and rmSync is never called. Cache files accumulate in the repo root and are never cleaned up.

Step-by-step proof:

  1. Developer runs export BUN_BUILD_CACHE_PATH= (clearing the variable without unsetting it).
  2. process.env.BUN_BUILD_CACHE_PATH is "" — not undefined, so the guard passes.
  3. resolve("") returns process.cwd() = /home/claude/bun (the repo root).
  4. cfg.cacheDir is set to /home/claude/bun.
  5. ccache, zig-cache, and WebKit tarballs are written into /home/claude/bun.
  6. Cleanup check: /home/claude/bun.startsWith(/home/claude/bun/build/release/) is false — skipped.
  7. Cache files accumulate in the repo root permanently.

In practice this is low-risk: all CI environments (Dockerfile and bootstrap.sh) assign concrete non-empty paths, and no mechanism in the PR's CI setup would result in an empty value. A developer would have to explicitly set the variable to empty to trigger this. The one-word fix is to replace !== undefined with a truthy check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants