Skip to content

feat(executor): use default memory pool in executor if no config provided#1637

Draft
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:feat/executor-mempool-default
Draft

feat(executor): use default memory pool in executor if no config provided#1637
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:feat/executor-mempool-default

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

--memory-pool-size is optional today: when not provided, the executor installs no memory pool and DataFusion falls back to its unbounded default. Operators that respect the pool (aggregates, sorts, joins, sort shuffle writer) therefore never spill and just keep growing — until the OS starts swapping. On TPC-H Q3 against SF100 with --partitions 2, this caused executor RSS to reach ~24 GB and the query to take ~49s (vs ~13s for hash shuffle).

The fix in #1636 patches the sort shuffle writer specifically, but every other memory-pool-aware operator still benefits from a sensible bounded default. New users on a laptop or in a small VM should not have to read the docs to learn that a flag is effectively required to keep the executor from blowing past physical RAM.

What changes are included in this PR?

  • New constant DEFAULT_MEMORY_POOL_BYTES_PER_TASK = 1 GiB.
  • When --memory-pool-size is not provided, default the total pool to concurrent_tasks * DEFAULT_MEMORY_POOL_BYTES_PER_TASK. With the default 8 task slots that's an 8 GiB total budget, split into 8 × 1 GiB per-task FairSpillPools.
  • The startup log line now distinguishes Memory pool (explicit) vs Memory pool (default) so it's obvious which path is in effect.
  • CLI help text and ExecutorProcessConfig::memory_pool_size doc comment updated to describe the new default.

Are there any user-facing changes?

The executor now always installs a bounded memory pool. Users who relied on the old unbounded default and have queries that run within physical RAM will see the same behaviour; users whose queries previously over-committed memory will start seeing operators spill instead. The new total budget is concurrent_tasks * 1 GiB; pass --memory-pool-size to override.

Verification

TPC-H Q3 against /opt/tpch/sf100, executor started with --concurrent-tasks 8 and no --memory-pool-size:

--partitions Sort before Sort after Hash (unchanged)
2 49.1s (~24 GB RSS) 16.9s (~10 GB RSS) 13.3s
4 14.1s 10.8s 8.2s
8 7.7s 7.9s 7.1s
16 6.3s 8.8s 11.2s

Startup log confirms the new default is taking effect:

INFO ballista_executor::executor_process: Memory pool (default): total 8589934592 bytes split into 8 tasks (1073741824 bytes each)

cargo test --release -p ballista-executor --lib passes (22/22).

The executor's `--memory-pool-size` was optional with `None` meaning "no
pool installed, use DataFusion's unbounded default". On a workload that
holds large amounts of data per task in memory (notably the sort shuffle
writer), an unbounded pool means operators never spill and the executor's
RSS grows until the OS starts swapping. TPC-H Q3 against SF100 with
`--partitions 2` peaked at ~24 GB RSS and ran in ~49s instead of ~13s.

Default the pool to `concurrent_tasks * 1 GiB` when no flag is provided.
Each per-task FairSpillPool gets 1 GiB of headroom — enough for typical
batch materialisation while keeping the total bounded. Operators that
respect the runtime memory pool (DataFusion aggregates, sorts, joins
with spill, etc.) gain backpressure for free; users who want a different
budget pass `--memory-pool-size` explicitly.

Logging now distinguishes the explicit and default paths so it's
obvious which is in effect.

Verified with TPC-H Q3 SF100 `--partitions 2`, no `--memory-pool-size`
flag: sort shuffle drops from ~49s to ~17s (~10 GB RSS); hash shuffle
unchanged at ~13s.
@andygrove andygrove changed the title feat(executor): default memory pool to concurrent_tasks * 1 GiB feat(executor): default memory pool to concurrent_tasks * 1 GiB [WIP] May 1, 2026
@andygrove andygrove changed the title feat(executor): default memory pool to concurrent_tasks * 1 GiB [WIP] feat(executor): use default memory pool in executor if no config provided May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant