Refactor: standard install/start/check/stop/load/query interface per system#860
Open
alexey-milovidov wants to merge 72 commits intomainfrom
Open
Refactor: standard install/start/check/stop/load/query interface per system#860alexey-milovidov wants to merge 72 commits intomainfrom
alexey-milovidov wants to merge 72 commits intomainfrom
Conversation
…/data-size
Each local system now exposes a small set of single-purpose scripts with a
stable contract, so they can be driven by a shared lib/benchmark-common.sh
and reused by external tooling (e.g. an online "run query against system X"
service):
install env prep + system install (idempotent)
start start daemon (idempotent; empty for stateless tools)
check trivial query, exit 0 iff responsive
stop stop daemon (idempotent)
load runs create.sql + loads data, deletes source files, sync
query SQL on stdin; result on stdout; runtime in fractional seconds
on the last line of stderr; non-zero exit on error
data-size prints data footprint in bytes (one integer to stdout)
Each system's old monolithic benchmark.sh is replaced by a 4-line shim that
sets a couple of env vars (BENCH_DOWNLOAD_SCRIPT, BENCH_RESTARTABLE) and
exec's lib/benchmark-common.sh. The shared driver runs the unified flow:
install -> start+check -> download -> load (timed) -> for each query
{flush caches; optionally stop+start to neutralize warm-process effects;
run query 3x} -> data-size -> stop. Output format ([t1,t2,t3], Load time,
Data size) matches the previous benchmark.sh exactly so cloud-init.sh.in's
log POST to play.clickhouse.com keeps working unchanged.
For dataframe/in-process systems (pandas, polars-dataframe, chdb-dataframe,
daft-parquet*, duckdb-dataframe, sirius), the engine is wrapped in a small
FastAPI server (server.py) so the start/stop/query interface still applies.
BENCH_RESTARTABLE=no for these (and for embedded CLIs like duckdb, sqlite,
datafusion, etc.) since restarting a single Python/CLI process between
queries would dominate query time.
Scope: 88 local systems refactored. Cloud/managed systems and a handful of
non-functional ones (csvq, dsq, locustdb, mongodb, polars CLI, exasol,
spark-velox) are intentionally left untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in clickhouse-datalake{,-partitioned}: upstream switched
the datalake variants from filesystem-cache to userspace page-cache (PR #818).
The refactored install/query scripts now adopt the page-cache approach.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mongodb: query takes a MongoDB aggregation pipeline (Extended JSON, one line) on stdin instead of SQL — these are the same canonical 43 ClickBench queries, just expressed as mongo pipelines. queries.txt is generated from queries.js (the source of truth) by replacing JS-only constructors (NumberLong, ISODate, NumberDecimal) with their EJSON canonical form. The shim sets BENCH_QUERIES_FILE=queries.txt to point the driver at it. polars: wrapped in a FastAPI server analogous to polars-dataframe, but the load step uses pl.scan_parquet (LazyFrame) so the parquet file remains needed at query time — the load script does NOT delete hits.parquet. data-size returns the on-disk parquet size since a LazyFrame has no materialized in-memory size. Both systems now expose the standard install/start/check/stop/load/query/ data-size scripts and a 4-line benchmark.sh shim, removing the old benchmark.sh / run.js / query.py / formatResult.js paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov
commented
May 7, 2026
…use in query Per review: clickhouse-local persists table metadata in its --path dir, so the CREATE TABLE only needs to run once during ./load. ./query just runs the query against the persisted table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexey-milovidov
commented
May 7, 2026
alexey-milovidov
commented
May 7, 2026
…atively Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… readiness Per review (alexey-milovidov): clickhouse start leaves the system in the desired state (server running) even when it returns non-zero with "already running". Make the shared driver tolerate non-zero from ./start and rely on bench_check_loop as the authoritative readiness signal. This lets per-system start scripts stay simple — they just need to make a best-effort attempt to launch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prmoore77
added a commit
to gizmodata/ClickBench
that referenced
this pull request
May 7, 2026
…ouse#860) Adopts the per-system 7-script interface from ClickHouse#860 for gizmosql/, and replaces the Java sqlline-based gizmosqlline client with the C++ gizmosql_client shell that ships with gizmosql_server. Scripts (matching the contract from lib/benchmark-common.sh): benchmark.sh - 4-line shim that exec's ../lib/benchmark-common.sh install - apt + curl gizmosql_cli_linux_$ARCH.zip; no openjdk, no separate gizmosqlline download start - idempotent server bring-up (skips if port 31337 is open) check - cheap TCP probe (auth-gated SQL would need credentials) stop - kills tracked PID; pkill belt-and-braces fallback load - rm -f clickbench.db, then create.sql + load.sql via gizmosql_client; deletes hits.parquet and sync's query - reads one query from stdin, runs via gizmosql_client with .timer on + .mode trash; emits fractional seconds as the last stderr line (parsed from "Run Time: X.XXs") data-size - wc -c clickbench.db Notes: - BENCH_DOWNLOAD_SCRIPT=download-hits-parquet-single, BENCH_RESTARTABLE=yes (gizmosql is a server, so per-query restart neutralizes warm-process effects, matching the clickhouse/postgres pattern in ClickHouse#860). - util.sh now exports GIZMOSQL_HOST/PORT/USER/PASSWORD - the env vars gizmosql_client reads natively, so query/load can call gizmosql_client with no flags. The server still receives the username via --username. - PID_FILE moved to a stable /tmp path (was /tmp/gizmosql_server_$$.pid, which broke across the start/stop process boundary in the new layout). This PR depends on ClickHouse#860 (which introduces lib/benchmark-common.sh and the contract). Once ClickHouse#860 lands, this PR's diff against main will be only the gizmosql/ files. Validated locally on macOS with gizmosql v1.22.4: the query script produces the expected fractional-seconds last line on stdout/stderr separation, and exits non-zero on error paths. See https://docs.gizmosql.com/#/client for gizmosql_client docs.
2 tasks
Resolves merge conflicts:
- Removed cedardb/run.sh, gizmosql/run.sh — superseded by the standard
query interface; the refactor branch already replaced them.
- Restored datafusion{,-partitioned}/make-json.sh, doris{,-parquet}/get-result-json.sh
with main's dated-results version. These are independent post-run JSON
builders, still referenced from the per-system READMEs.
- Kept the thin benchmark.sh shim in gizmosql/, spark-{auron,comet,gluten}/,
trino/. Per-system result-JSON auto-save (added on main while this branch
was in flight) is intentionally not carried over: under the new interface,
result.csv is the single timing artifact and JSON construction belongs in
separate tooling.
- gizmosql/{install,load,query,util.sh}: merge auto-took main's switch from
gizmosqlline (Java) to gizmosql_client (CLI shipped with the server),
but the refactor branch's load/query still referenced GIZMOSQL_SERVER_URI
and GIZMOSQL_USERNAME. Updated install to drop openjdk + gizmosqlline,
load to use gizmosql_client (and stop the server first to release the
database file), and query to drive gizmosql_client with .timer/.mode trash
and parse "Run Time:" instead of "rows selected (... seconds)".
…-system layout These four entries were added on main while this branch was in flight (the existing trino/ scripts here were a memory-connector stub that never worked end-to-end). Rebuild each one against the new install/start/check/stop/load/ query/data-size contract so they share lib/benchmark-common.sh: - trino, trino-partitioned: Hive connector + file metastore + local Parquet hardlinked into data/hits/ (matches main's working impl from PR #856). - trino-datalake{,-partitioned}: same, plus the AnonymousAWSCredentials shim to read clickhouse-public-datasets/hits_compatible/athena from anonymous S3 (the published bucket size is reported by data-size since the data is read on demand). BENCH_DOWNLOAD_SCRIPT="" — no local dataset to fetch. - benchmark.sh in all four becomes a 4-line shim. Old run.sh deleted.
…r-system layout
These four entries were added on main while this branch was in flight.
Adapt them to the install/start/check/stop/load/query/data-size contract:
- presto, presto-partitioned: Hive connector + file metastore + local Parquet
hardlinked into data/hits/.
- presto-datalake{,-partitioned}: same plus the AnonymousAWSCredentials shim
(compiled in a throwaway trinodb/trino container, since the prestodb image
ships only a JRE) so the hive-hadoop2 plugin can read the public bucket
anonymously. BENCH_DOWNLOAD_SCRIPT="" — schema-only load against S3.
Each benchmark.sh becomes a 4-line shim. Old run.sh deleted.
These two entries were added on main while this branch was in flight. Adapt to the install/start/check/stop/load/query/data-size contract: - BENCH_DOWNLOAD_SCRIPT="" — the vortex bench binary fetches Parquet and converts to .vortex on first invocation. - BENCH_RESTARTABLE=no — embedded Rust CLI; per-query restart would dominate query time. - query: stages stdin into a temp queries-file and passes -q 0, since the bench binary addresses queries by index rather than reading SQL on stdin. - The single variant uses the `clickbench` binary (vortex 0.34.0); the partitioned variant uses `query_bench clickbench` (vortex 0.44.0). Old run.sh deleted.
Quickwit was added on main while this branch was in flight. Adapt to the install/start/check/stop/load/query/data-size contract: - BENCH_QUERIES_FILE="queries.json" — Quickwit accepts Elasticsearch-format JSON queries via the /_elastic compat API, not SQL. queries.json holds one ES query per line; queries not expressible in Quickwit are encoded as the literal "null". - BENCH_DOWNLOAD_SCRIPT="" — the load script fetches hits.json.gz directly (there is no shared download-hits-json helper) and pipes it through `quickwit tool local-ingest`, since v0.9's sharded ingest-v2 endpoint caps single-node throughput at a few MB/s. - BENCH_RESTARTABLE=yes — relies on the common driver's per-query restart to flush Quickwit's fast_field_cache and split_footer_cache (the result caches are already disabled in node-config.yaml). - query: returns non-zero for "null" queries so the framework records null in the per-query timing array; otherwise reports .took (ms → seconds). Old run.sh deleted.
The original used /tmp/gizmosql_server_$$.pid where $$ is the calling process's PID. That worked when benchmark.sh sourced util.sh and called start/stop in the same shell, but under the new per-system layout each of start, stop, load, and query sources util.sh in its own subshell — so stop_gizmosql couldn't find the PID file written by start_gizmosql. Use a fixed path under the system directory instead. Also expose wait_for_gizmosql so callers (like load) can wait for readiness without restarting.
Conflict only in gizmosql/benchmark.sh — kept the thin shim. Main switched gizmosql to the official one-line installer (PR #879); fold that into gizmosql/install so we stop hand-detecting arch and downloading the zip. Other changes auto-merged: quickwit/index_config.yaml gained tag_fields on CounterID + record:basic on text fields (PR #886), and assorted result JSONs for ClickHouse Cloud / Citus / Cratedb / etc.
start/stop scripts may emit progress lines (clickhouse-server prints PID table tracking, sudo's chown invocation, postgres's startup messages, etc.). With BENCH_RESTARTABLE=yes those scripts run before every query, so their output interleaves with the parseable [t1,t2,t3] / Load time / Data size lines and breaks the cloud-init log POST to play.clickhouse.com. Redirect both stdout and stderr from ./start and ./stop to /dev/null at the three call sites in lib/benchmark-common.sh. The check loop is the authoritative readiness signal, so losing start's output costs nothing in steady state; for debugging, run ./start manually outside the driver.
The DuckDB installer at install.duckdb.org drops the binary into ~/.duckdb/cli/latest/duckdb and only suggests adding that directory to PATH. Previously each install attempted a per-user symlink into ~/.local/bin, which silently no-ops when that directory isn't on PATH (default for root in cloud-init). The result was ./check failing for 300s with no useful error. Symlink to /usr/local/bin/duckdb via sudo right after install instead; that's on PATH for every user, and the symlink is itself idempotent.
Ubuntu's docker.io ships the docker CLI without the v2 compose plugin, so the existing `command -v docker` short-circuit skipped installation on boxes that already had docker but no `docker compose`. ./start then ran `docker compose up -d`, which silently failed, and ./check timed out at 300s. Fall back to docker-compose-v2 for the Ubuntu package name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Throughput variant of ClickBench. N connections (default 10) hold open sessions and each picks a uniformly random query from the standard 43-query set; the run goes for a fixed wall-clock window (default 600s) after a warmup. Reports completed queries, QPS, latency p50/p95/p99, and per-query mean. Backends: ClickHouse over HTTP (stdlib http.client), StarRocks over the MySQL wire protocol (pymysql). Each system's recommended path so neither is paying a wire-format penalty the other isn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ned}/query: pass query via temp file `python3 - <<'PY' ... PY` directs the heredoc into python3's stdin so the interpreter can read its program from there. Once the heredoc is fully consumed, sys.stdin (the same FD) is at EOF — so sys.stdin.read() inside the heredoc returned an empty string, and chdb / hyper / sail dutifully ran the empty query and reported ~0.000s for every try. Stage stdin into a temp file in bash before invoking the heredoc and pass the path as argv[1]; the python script reads the query from that file. Also include result materialization in the timing window for chdb/query and chdb-parquet-partitioned/query (move `end = ...` past fetchall / str(res)) — the timer was previously stopped before the result was realized, which would have under-counted query time even when the stdin bug wasn't masking it entirely.
Right now ./check stderr is silently dropped while the loop retries for 300s, then we report "did not succeed within 300s" with no clue why. For deterministic failures (missing env var like YT_PROXY for chyt, an install step that didn't run, etc.) the user wastes 5 minutes and still has to dig through the per-system check script to find out what happened. Capture the last attempt's stderr and print it on timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The upstream install path assumes RHEL/Rocky/Alma — yum, grubby, SELinux, the wheel group, /data0. On Ubuntu/Debian the prereqs phase silently half-completes (several |||| true skips), the gpadmin user is sometimes not created, and db-install would later die at `yum install -y go`. Either way ./check times out at 300s with no diagnostic. Bail with a clear "needs yum" message before doing anything destructive, and call out the requirement in the README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloud-init runs scripts as root with HOME unset. Tools that follow
XDG-ish conventions then fall over: the GizmoSQL one-line installer
exits at line 32 with "HOME: parameter not set" (it runs under `sh -u`),
duckdb-vortex's `INSTALL vortex` writes to /.duckdb/extensions/... and
later fails to find it ("Extension /.duckdb/extensions/v1.5.2/..."),
and duckdb-datalake{,-partitioned} queries crash 43 times each with
"Can't find the home directory at ''" while autoloading httpfs.
Each affected install script tried to paper over this locally with
`export HOME=${HOME:=~}`, but the export only lives for that script —
the sibling load/query scripts the lib runs in fresh subprocesses still
see HOME unset. Set it once here so every per-system step inherits it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
apt's monetdb5-sql post-install creates /var/lib/monetdb as the monetdb user's home dir, so the existing `if [ ! -d /var/lib/monetdb ]` guard skipped `monetdbd create` and left the dbfarm uninitialized. ./check then looped 300s on `mclient: cannot connect: control socket does not exist` and the run died. Probe the dbfarm marker file (.merovingian_properties) instead of the directory, and explicitly `monetdbd start` after create — both are idempotent, and a daemon that's already up just no-ops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
paradedb/paradedb:0.10.0 (the prior pin) was rotated out of Docker Hub — docker pull returned "manifest not found" and ./check timed out. The oldest tags still hosted are 0.15.x, so move both directories onto a real Postgres-version-specific tag (latest-pg17) that paradedb still maintains. This unblocks the image pull. NOTE: paradedb dropped its pg_lakehouse / parquet_fdw extension after 0.10.x (the parquet_fdw_handler() function no longer exists), so create.sql still needs to be reworked away from the foreign-table approach for queries to succeed end-to-end. That's a separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior URL (qa-build.oss-cn-beijing.aliyuncs.com selectdb-doris-2.1.7-rc01) returned 404 — SelectDB stopped publishing free standalone tarballs once the product moved fully to a managed-cloud offering. VeloDB (the company that now stewards SelectDB) hosts the official Apache Doris release binaries instead, which are functionally what SelectDB ships today. Pin to the current stable (4.0.5) and use the symmetric $dir_name path layout that doris/install already uses, instead of the hardcoded selectdb-doris-2.1.7 segment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 30 s per-bulk-request timeout was tripping `requests.exceptions.ReadTimeout` partway through ingest, once the index had grown enough that ES needed to flush + merge mid-batch. Bump to 300 s so a single bulk doesn't fail the whole load just because the server stalled briefly under merge pressure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tiup playground` prints `TiDB Playground Cluster is started` once tidb
itself binds :4000, but tiflash joins the cluster a beat later. ./load
runs `ALTER TABLE ... SET TIFLASH REPLICA 1` immediately and that
fails with
the tiflash replica count: 1 should be less than the total tiflash server count: 0
when tiflash hasn't registered yet. Poll
information_schema.tikv_store_status until at least one tiflash store
shows up before declaring start done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Even after the 600 s curl waits and 1200 s ./check budget, the 0.1.0-GA image (Aug 2023) was still leaving \`server\` container in an exited state — most likely the bootstrap chain mismatches a current docker networking / fdb / hadoop interaction in that image. ByConity's current stable on Docker Hub is 1.0.1-hotfix1 (Nov 2024); upgrade everywhere the compose file pins it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cockroachdb replays its WAL on each restart; after the 60 GB+ IMPORT the lib's default 300 s window timed out before the first SELECT 1 succeeded post-restart. Bump to 900 s. sirius's server.py initializes CUDA / cuDF on startup which can take several minutes on a cold instance. Same bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
duckdb-vortex-partitioned/install fails during the cmake configure step with `unable to read $HOME` from vcpkg, which then cascades into the "CMake unable to find Ninja" / "CMAKE_C_COMPILER not set" errors that look like missing build deps but really aren't. Same root cause as the duckdb-vortex `Extension /.duckdb/extensions/...` errors: tools that follow XDG conventions need HOME, and cloud-init runs as root with HOME unset on operator checkouts that predate c288eab. Pin `export HOME="${HOME:-/root}"` in three places so the chain works regardless of how the script is reached: - duckdb-vortex-partitioned/install before vcpkg/cmake runs. - duckdb-vortex/install at the top (the previous `export HOME` was inside the `install-duckdb` branch and got skipped on re-runs where duckdb was already on PATH, so `duckdb -c "INSTALL vortex"` still wrote to /.duckdb). - lib/benchmark-common.sh, so every system's load/query inherits a real HOME even when the operator's cloud-init.sh.in is stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a "Discontinued or Inaccessible Systems" section listing five directories whose install paths can no longer be reproduced as of May 2026, along with the specific failure observed for each so future contributors don't burn time chasing them: - vertica: `docker pull vertica/vertica-ce` -> access denied - oxla: `public.ecr.aws/oxla/release` -> not found (Redpanda acquired Oxla in Oct 2025) - kinetica: GitHub release for the `kisql` CLI was deleted upstream - heavyai: GPG key URL 403's, apt repo gone, no public Docker image - infobright: company defunct since 2017, community image hangs mid-load Directories are kept so historical results remain on the website. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gone away" This reverts commit 3b14156.
…unreproducible
Each system's directory now carries a "Status (as of May 2026):
unreproducible" section quoting the specific install failure observed,
so a reader who lands on the directory directly knows the script can't
be made to work without a fresh upstream path:
- vertica: `docker pull vertica/vertica-ce` -> pull access denied
- oxla: `public.ecr.aws/oxla/release:1.53.0-beta` -> not found
(Oxla was acquired by Redpanda in Oct 2025)
- kinetica: `kisql` v7.1.7.2 GitHub release was deleted; newer tags
ship no compiled artifacts
- heavyai: `releases.heavy.ai` apt repo + GPG key both 403 (S3
AccessDenied); no public Docker image
- infobright: company defunct since 2017; community flolas/infobright
image hangs silently mid-`LOAD DATA`
Directories and historical results are kept for reference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the soft "Status: unreproducible" framing with a blunt "Dead (May 2026)" header. The body still quotes the specific failure observed for each, but the ending now reads "nothing here runs anymore" instead of "new submissions aren't expected without a working install path". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v7.1.7.2 GitHub release was deleted upstream, and newer release pages ship no compiled artifacts — but the `kisql` file (a 14 MB self-extracting bash+jar launcher) is committed directly to the repo root and is reachable through raw.githubusercontent.com. Pull it from there at v7.2.3.17, which matches the 7.2.x server the kinetica.sh installer brings up. Drop the "Dead" section from kinetica/README.md and replace it with a short note explaining the new source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HEAVY.AI's apt repo and tarball CDN (releases.heavy.ai/...) both started returning S3 AccessDenied, so the previous native install (curl GPG key | apt-key add; apt-get install heavyai) can't proceed. The source repo (github.com/heavyai/heavydb) is alive — v9.0.0 was just released 2025-10-20, not archived — but its GitHub releases ship no binaries, and a full C++ build is too heavy to fit inside cloud-init. omnisci/core-os-cpu:v5.10.2 (Feb 2022) is the last public Docker image — OmniSciDB, the predecessor of HeavyDB before the v6.0.0 rename. The schema and queries this benchmark uses are vanilla enough to run unchanged. Replace install/start/check/stop/load/query/data-size with versions that pull and drive the container via `docker exec /omnisci/bin/omnisql` (the OmniSci CLI; it became `heavysql` after the rename, hence the binary path change). Storage is bind-mounted into ./heavyai-storage so the data-size step still has something to du. Update README accordingly: replace the "Dead (May 2026)" section with a "Sourcing the binary" note explaining what's actually going on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without an explicit ORDER BY, `LIMIT 1 BY system, machine` returned an arbitrary row per (system, machine). For systems with several runs in the past week it tended to pick an older row, so the directory name generated by `formatDateTime(time, '%Y%m%d', 'UTC')` used that older date — while the inner `ORDER BY time DESC LIMIT 1` still wrote the latest output content. The file ended up in `<system>/results/<old-date>/<machine>.json` (overwriting an existing file there), and `generate-results.sh` never saw a directory for today. Concretely: for (clickhouse, c6a.4xlarge) with rows on 2026-05-05 / -07 / -08 / -09, `LIMIT 1 BY` picked 2026-05-07 21:31:58. Today's run ends up written to `clickhouse/results/20260507/...` and nothing appears under `20260509/`, so the website still shows 2026-05-08 as the latest. Add `ORDER BY time DESC` before `LIMIT 1 BY` so the latest row is selected — directory name and content date now agree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It belongs in a separate effort — the QPS benchmark with N persistent connections that landed in e2669c4 doesn't fit the per-system-script interface this branch is converging, and keeping it here muddies the diff against main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cloud.tembo.io no longer resolves; the OLAP stack on Tembo Cloud has been discontinued. Add the "historical" tag to the existing result (in line with the same tag used by other dead-cloud and end-of-life entries under clickhouse/, monetdb/, vertica/, infinidb/, …) and a Status note at the top of the README so the entry isn't taken for runnable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slow row-oriented and OLTP systems (mysql, mariadb, mongodb, cratedb,
sqlite, turso, postgresql{,-indexed,-orioledb}, mysql-myisam,
timescaledb-no-columnstore) keep hitting the 20000 s cloud-init wall —
their load alone runs 1-3 hours and BENCH_RESTARTABLE=yes adds a few
seconds × 43 queries × 3 tries on top. The latest runs all show
"Total time: 20018-20021" with the run cut off mid-load or partway
through the query set.
Lift to 36000 s (10 h). Still capped enough that a runaway run
shuts down rather than burning EC2 forever.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`lib/download-hits-tsv` decompresses hits.tsv.gz to hits.tsv before returning, so by the time load runs there's no .gz left. The leftover \`pigz -fkd hits.tsv.gz\` bombed with "skipping: hits.tsv.gz does not exist" and aborted before INSERT — explains why the most recent run got past my docker-compose-image fix only to die at this step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After bumping druid to 37.0.0 the install runs verify-java which errors with "Druid requires Java 17 or 21. Your current version is: 11.0.30." Switch the apt package from openjdk-11-jdk to openjdk-17-jdk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier --ignore-installed only covered the venv's `pip install --upgrade setuptools wheel` step, but the same Ubuntu 24.04 packaging RECORD-less issue trips at the *system* `python3.11 -m pip` setup that runs before the venv is even created — `get-pip.py` pulls in pip + setuptools + wheel, wheel demands packaging>=24.0, and pip then refuses to uninstall the apt-shipped 24.0. Pass the same flag to get-pip.py invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tikv_store_status` shows only TiKV stores (the type column has no 'tiflash' entry), so the previous wait loop never broke out and we declared start done while tiflash hadn't actually registered yet. \`information_schema.cluster_info\` lists every PD-known component (pd / tidb / tikv / tiflash) and is the canonical "is X up" view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pinot's quickstart starts a controller, broker, server and Zookeeper inside one JVM and hadn't bound :9000 within the lib's 300 s default. Bump to 900 s. sirius's server.py initializes CUDA / cuDF on first hit; 900 s wasn't enough either, bump to 1800 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 0.44 datafusion-vortex was pinned to vortex 0.34.0, whose .gitmodules references two now-gone private spiraldb forks (spiraldb/duckdb, spiraldb/duckdb-rs). git submodule update fails on both with "could not read Username for 'https://github.com'" and the build aborts. From 0.41.0 onward upstream replaced spiraldb/duckdb with duckdb/duckdb, and 0.42.0+ ship without a .gitmodules at all. Bump to 0.44.0 (matches the partitioned variant). Once the submodules are reachable, both variants then fail in the same place — `vortex-duckdb`'s build.rs runs bindgen, which needs libclang plus the clang freestanding headers (stdbool.h etc.); without libclang-dev the build fails with `'stdbool.h' file not found`. Add clang + libclang-dev to apt installs in both directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…runs \`~/.questdb/db/hits*\` silently did nothing on QuestDB v9.x layouts that don't suffix the table directory, leaving the run logging \`du: cannot access ...\` followed by "Data size: 0". The downstream parser then rejected the row (data_size = 0 fails its >=5 GB filter) and the website never picked the run up — even though the queries themselves ran fine. Just measure the whole \`~/.questdb/db\` tree (it only contains the bench's \`hits\` table). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bench_run_query was reading "fractional seconds" via plain \`tail -n1\` of the query script's stderr. Per-system query scripts are documented to put the timing on the last stderr line, but pyspark (spark, spark-auron) and several JVM-based ones print SparkContext shutdown logs *after* their measurement, so tail picked up "Stopping SparkContext" or similar and the timing parsed to "null" silently — every query of that run came back as [null,null,null] and sink.results' parser rejected the whole run. Filter for the last numeric line instead. Same contract for systems whose stderr is already clean; resilient when it isn't. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pre-refactor heavysql incantation didn't pass a database name — omnisql defaults to \`omnisci\`. Some 5.10.2 builds treat the trailing positional as a script-file path rather than a db, which made check fail silently. Drop it everywhere (check / load / query) so every omnisql invocation matches the original behavior. The first run of the docker rewrite also blew through the 600 s check window while the omnisci server was still binding Thrift; lift BENCH_CHECK_TIMEOUT to 900 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The SelectDB brand has been retired: the hosted SaaS at selectdb.cloud returns "404 Route Not Found" for every path, and the company that maintained the SelectDB-branded Apache Doris distribution was renamed to VeloDB. The directory was already repointed to download.velodb.io last week; this commit catches the rest up. - selectdb/ -> velodb/ (git mv). - template.json: "system" SelectDB -> VeloDB. - README rewritten to explain the rename and the dead SaaS. - All existing result JSONs updated: "system" -> VeloDB, "historical" added to tags, comment stamped (these were collected under the old brand on older binaries; new submissions will reflect the current VeloDB-distributed Apache Doris build). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ByteHouse's international cloud at bytehouse.cloud is no longer reachable from outside the China region (the SaaS still operates within China via Volcengine). Every existing result in this directory came from the international cloud and is now re-tagged with "historical" + a stamped comment. The README gets a Status note that distinguishes self-managed / Volcengine submissions, which should not be marked historical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These results stay valid under the new brand — the engine is the same Apache Doris distribution, only the brand changed. Strip the historical tag and the auto-stamped comment from all 11 result JSONs and reword the README's History section to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 36000s (10h) cap on ./benchmark.sh was hard-coded in cloud-init.sh. Lift it behind BENCHMARK_TIMEOUT, defaulting to 36000s so existing runs are unchanged, and forward the var from run-benchmark.sh on the same path as YT_PROXY/YT_TOKEN/CHYT_ALIAS — operators can now bump (or shrink) the cap without editing the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch from runtime override (BENCHMARK_TIMEOUT in the cloud-init env) to render-time substitution: cloud-init.sh.in now has `timeout @timeout@` and run-benchmark.sh substitutes it from $timeout (default 36000), matching how @System@, @repo@, @Branch@, and @runtime_env@ already work. End state for operators is the same — `timeout=NNN run-benchmark.sh ...` overrides the cap — but the rendered script reads naturally (`timeout 36000 ./benchmark.sh`) instead of dragging an env var through the cloud-init scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the YT_PROXY / YT_TOKEN / CHYT_ALIAS forwarding loop in
run-benchmark.sh and the @runtime_env@ injection block in cloud-init.sh.in
that fed it. The chyt/ entry and hardware/benchmark-{chyt,yql}.sh remain
intact for anyone running them locally with the env vars set; they're
just no longer auto-forwarded by the cloud-init render path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
benchmark.shinto 7 single-purpose scripts (install,start,check,stop,load,query,data-size) with a stable contract, driven by a new sharedlib/benchmark-common.sh.Why
Previously, every system's
benchmark.shbundled installation, server lifecycle, dataset download, data loading, and query dispatch into one script — andrun.shhard-coded the per-query orchestration. There was no programmatic per-query entry point, so:run.shran all 3 tries inside a single CLI invocation, so OS-cache warmth from try 1 leaked into tries 2/3.The new per-system interface
installstartcheckSELECT 1). Exit 0 iff responsive.stoploadsync.query0.123)data-sizeEach system's
benchmark.shbecomes a 4-line shim that sets a couple of env vars andexec's the shared driver:The shared driver runs
install → start+check → download → load (timed) → for each query: flush caches; if BENCH_RESTARTABLE=yes, stop+start; run query 3× → data-size → stop. The output log shape (Load time:,[t1,t2,t3],per query,Data size:) is identical to the oldbenchmark.sh, socloud-init.sh.in's POST to play.clickhouse.com keeps working unchanged.BENCH_RESTARTABLE=nofor embedded CLIs (duckdb, sqlite, datafusion, …) and dataframe wrappers — restarting a single CLI/Python process between queries would dominate query time. For these, OS caches are still flushed between queries.Scope
Refactored (88 systems):
Not refactored (intentionally out of scope):
Validated end-to-end on a 96-core / 185 GB ARM machine
null(framework's error path works)All 88 refactored systems pass
bash -nand have executable bits set on the 7 scripts + benchmark.sh.Bug fixes surfaced during validation
lib/benchmark-common.sh:data-sizenow runs beforestop(clickhouse and pandas need the server up to report size).clickhouse/start: idempotent (was erroring when already running).duckdb/load,sqlite/load:rm -f hits.db/mydbfor idempotent reruns.postgresql/load:-v ON_ERROR_STOP=1so COPY data errors actually fail the script instead of silently rolling back.BENCH_DOWNLOAD_SCRIPTmay now be empty for systems that read directly from S3 datalakes / remote services (clickhouse-datalake*, duckdb-datalake*, chyt, …).Flagged for follow-up review
duckdb-memory—:memory:semantics force a per-query reload; will inflate timings vs. the original single-process flow.cloudberry,greenplum— multi-phase install (reboot between phases); the shim only runs phase 1.sirius— GPU-dependent; long-livedduckdbCLI subprocess proxy; review the stdin/sentinel protocol.paradedb*,pg_ducklake,pg_mooncake— Docker container created ininstallthendocker cpinload(small divergence from the originaldocker run -v ...due to the lifecycle order:startruns beforedownload).Test plan
bash -non all 88 systems' scripts🤖 Generated with Claude Code