statistics: simplify FMSketch by removing swiss map and pool by mjonss · Pull Request #66590 · pingcap/tidb

mjonss · 2026-02-28T03:23:48Z

Go 1.24+ uses swiss tables natively, so the dolthub/swiss dependency
is no longer needed, and is in fact deprecated. Replace it with native map[uint64]struct{} and
use standard library functions (maps.Clone, maps.DeleteFunc). Remove
the sync.Pool for FMSketch as its benefit is marginal — nil the
references instead to allow prompt GC.

What problem does this PR solve?

Issue Number: close #66589

Problem Summary:

FMSketch uses github.com/dolthub/swiss for its internal hashset. Since Go 1.24, the standard runtime map uses swiss tables natively, making the third-party dependency redundant. The sync.Pool for FMSketch adds complexity across many call sites for marginal benefit.

What changed and how does it work?

Replaced *swiss.Map[uint64, bool] with native map[uint64]struct{} (zero-sized values save 1 byte per entry)
Removed dolthub/swiss dependency from go.mod
Removed fmSketchPool (sync.Pool), reset(), and DestroyAndPutToPool() from FMSketch
Nil FMSketch references at former DestroyAndPutToPool call sites to allow prompt GC
Used maps.Clone for Copy() and maps.DeleteFunc for mask-based pruning
Updated MemoryUsage() to reflect 8 bytes per entry (was 9 with bool)
Pre-allocate map to maxSize capacity in NewFMSketch to avoid rehashing during merge

Benchmark results

Standalone benchmark simulating the production ANALYZE flow (region→partition merge, save/load round-trip, partition→global merge) with 8000 partitions × 50 regions on Go 1.25.6 darwin/arm64. Median of 5 iterations:

	Phase 1 (merge)	Phase 2 (serde)	Phase 3 (global)	Total
This PR	7.3s	2.5s	0.49s	10.3s
Master	16.6s	2.5s	0.48s	19.5s

47% faster overall. The swiss.Map.Clear() called by DestroyAndPutToPool() on every sketch release costs far more than the GC it avoids. GC pauses are negligible (29 runs, 1.4ms total across 7.3s of work).

Full benchmark report and raw data in data/fmsketch-bench-results/. Standalone benchmark tool in cmd/fmsketch-bench/.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

Chores
- Removed two indirect external dependencies to reduce the dependency footprint.
- Deleted a deprecated public resource-release method.
Refactor
- Replaced pooled lifecycle management with native in-memory structures and simplified cleanup to rely on garbage collection; NDV and merge behaviors preserved.
Tests
- Updated statistics tests to reflect the new internal representations and size checks.

Go 1.24+ uses swiss tables natively, so the dolthub/swiss dependency is no longer needed. Replace it with native map[uint64]struct{} and use standard library functions (maps.Clone, maps.DeleteFunc). Remove the sync.Pool for FMSketch as its benefit is marginal — nil the references instead to allow prompt GC. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pantheon-ai · 2026-02-28T03:23:53Z

Review Complete

Findings: 2 issues
Posted: 1
Duplicates/Skipped: 1

tiprow · 2026-02-28T03:24:06Z

Hi @mjonss. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-02-28T03:25:15Z

📝 Walkthrough

Walkthrough

Replaced dolthub/swiss/maphash usage with Go native maps for FMSketch, removed the FMSketch sync.Pool lifecycle, and changed code to nil FMSketch references (letting GC reclaim) instead of returning them to a pool; corresponding dependency and Bazel entries were removed.

Changes

Cohort / File(s)	Summary
Dependency removal `go.mod`, `DEPS.bzl`, `pkg/statistics/BUILD.bazel`	Removed indirect/go_repository entries for `github.com/dolthub/swiss` and `github.com/dolthub/maphash`; dropped `@com_github_dolthub_swiss` from Bazel deps.
FMSketch core `pkg/statistics/fmsketch.go`, `pkg/statistics/fmsketch_test.go`	Converted `FMSketch.hashset` from `*swiss.Map` to native `map[uint64]struct{}`; removed pool usage and lifecycle methods; updated constructors, copy/merge/NDV/proto/memory calculations and tests to use native map APIs and `maps.Clone`.
Analyze & sampling `pkg/statistics/analyze.go`, `pkg/statistics/row_sampler.go`	Replaced per-element `DestroyAndPutToPool` calls with slice niling (e.g., `a.Fms = nil`, `s.FMSketches = nil`).
Global stats handling `pkg/statistics/handle/globalstats/global_stats.go`, `pkg/statistics/handle/globalstats/global_stats_async.go`	After partition merges and in async workers, FMSketch references are set to `nil` instead of being returned to pool; merge/NDV logic remains.
Storage JSON export `pkg/statistics/handle/storage/json.go`	When building JSON table protos, column FMSketch fields are nulled instead of returned to pool; histogram destruction unchanged.
API removal `pkg/statistics/table.go`	Removed public method `Table.ReleaseAndPutToPool()` and its explicit FMSketch release logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

contribution, ok-to-test

Suggested reviewers

0xPoe
yudongusa

Poem

🐇 I swapped the Swiss for native art,
Maps keep each hashed little part,
Pools tucked in a quiet sill,
I nil the trails and let GC fill,
Hopping light with code made smart.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main change: removing the swiss map dependency and the sync.Pool from FMSketch.
Linked Issues check	✅ Passed	The PR addresses all coding requirements from issue `#66589`: replaced swiss.Map with native map[uint64]struct{}, removed fmSketchPool and related methods, used maps.Clone and maps.DeleteFunc, updated MemoryUsage, and added unit tests.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the objectives: removing swiss map dependency, replacing with native map, removing sync.Pool, updating call sites to nil FMSketch references, and maintaining functional parity via standard library helpers.
Description check	✅ Passed	The PR description is comprehensive and follows the template structure with all required sections: problem statement with issue number, detailed explanation of changes, benchmark results, checklist with unit tests marked, side effects declaration, and release notes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

pkg/statistics/fmsketch.go (1)
232-238: Consider native map memory overhead in MemoryUsage.

The current calculation 8 * len(s.hashset) accounts only for the key storage. Go's native map has additional overhead (bucket headers, overflow pointers, etc.) that varies by map size. For large maps, actual memory usage could be 1.5-2x the calculated value.

This may be acceptable if the estimation is intended to be conservative, but consider documenting the approximation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/statistics/fmsketch.go` around lines 232 - 238, The MemoryUsage
implementation for FMSketch underestimates map overhead by counting only key
sizes; update the calculation in MemoryUsage (and its comment) to either apply a
conservative map-overhead factor (e.g., multiply 8*len(s.hashset) by ~1.5–2.0)
or switch to a more accurate estimation using runtime/unsafe inspection, and
explicitly document that this is an approximation; refer to the FMSketch type,
its MemoryUsage method, and fields mask, maxSize and s.hashset when making the
change so the comment and returned value both reflect the adjusted estimate or
the rationale for keeping an approximation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 232-238: The MemoryUsage implementation for FMSketch
underestimates map overhead by counting only key sizes; update the calculation
in MemoryUsage (and its comment) to either apply a conservative map-overhead
factor (e.g., multiply 8*len(s.hashset) by ~1.5–2.0) or switch to a more
accurate estimation using runtime/unsafe inspection, and explicitly document
that this is an approximation; refer to the FMSketch type, its MemoryUsage
method, and fields mask, maxSize and s.hashset when making the change so the
comment and returned value both reflect the adjusted estimate or the rationale
for keeping an approximation.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b104014 and b5ffc4f.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (9)

go.mod
pkg/statistics/analyze.go
pkg/statistics/fmsketch.go
pkg/statistics/fmsketch_test.go
pkg/statistics/handle/globalstats/global_stats.go
pkg/statistics/handle/globalstats/global_stats_async.go
pkg/statistics/handle/storage/json.go
pkg/statistics/row_sampler.go
pkg/statistics/table.go

💤 Files with no reviewable changes (2)

go.mod
pkg/statistics/table.go

Copilot

Pull request overview

Removes the archived dolthub/swiss dependency from FMSketch and simplifies lifecycle management by switching to native Go maps and dropping the sync.Pool-based reuse pattern.

Changes:

Replaced *swiss.Map[uint64,bool] with map[uint64]struct{} and updated related operations (Clone, pruning deletes, NDV calc).
Removed FMSketch pooling/reset APIs and updated call sites to nil references instead of returning sketches to a pool.
Dropped dolthub/swiss (and its indirect dependency) from go.mod/go.sum and updated unit tests accordingly.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/statistics/fmsketch.go	Switches hashset implementation to native map and removes pool-based lifecycle methods.
pkg/statistics/fmsketch_test.go	Updates tests to reflect native map operations (`len`, range).
pkg/statistics/table.go	Removes per-column/index sketch “destroy to pool” calls during table release.
pkg/statistics/row_sampler.go	Replaces sketch pool-destruction with nil-ing the sketch slice reference.
pkg/statistics/analyze.go	Replaces per-result sketch destruction with nil-ing `Fms`.
pkg/statistics/handle/storage/json.go	Nils column `FMSketch` after JSON dump instead of returning it to a pool.
pkg/statistics/handle/globalstats/global_stats.go	Nils merged partition FMSketch references after computing global NDV.
pkg/statistics/handle/globalstats/global_stats_async.go	Nils merged async FMSketch references after computing global NDV.
go.mod	Removes `github.com/dolthub/swiss` and indirect `github.com/dolthub/maphash`.
go.sum	Removes checksums for the removed dolthub dependencies.

pkg/statistics/handle/globalstats/global_stats_async.go

pkg/statistics/fmsketch.go

pkg/statistics/handle/globalstats/global_stats.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

pkg/statistics/fmsketch.go (1)
238-244: Consider noting that this is a lower-bound estimate.

The calculation accounts for key storage (8 bytes per uint64) but not Go map's structural overhead (header, buckets, load factor slack). This results in an underestimate of actual memory.

For estimation purposes this is likely acceptable, but if precise memory tracking becomes important, the calculation could include a rough overhead factor (e.g., ~1.5x to 2x the key-only size for typical map sizes).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/statistics/fmsketch.go` around lines 238 - 244, The MemoryUsage method on
FMSketch currently computes only the raw bytes for mask/maxSize and the uint64
keys in s.hashset, which underestimates real Go map memory; update the
implementation of FMSketch.MemoryUsage (and its comment) to either document that
the returned value is a lower-bound estimate or include an approximate map
overhead multiplier (e.g., 1.5–2x) when computing sum from len(s.hashset), and
mention the omission of Go map headers/buckets in the comment so callers know
this is not the full heap usage.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 238-244: The MemoryUsage method on FMSketch currently computes
only the raw bytes for mask/maxSize and the uint64 keys in s.hashset, which
underestimates real Go map memory; update the implementation of
FMSketch.MemoryUsage (and its comment) to either document that the returned
value is a lower-bound estimate or include an approximate map overhead
multiplier (e.g., 1.5–2x) when computing sum from len(s.hashset), and mention
the omission of Go map headers/buckets in the comment so callers know this is
not the full heap usage.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fb8b13 and 04d970b.

📒 Files selected for processing (3)

DEPS.bzl
pkg/statistics/BUILD.bazel
pkg/statistics/fmsketch.go

💤 Files with no reviewable changes (2)

pkg/statistics/BUILD.bazel
DEPS.bzl

hawkingrei · 2026-02-28T04:10:37Z

@mjonss Please fix the linter.

codecov · 2026-02-28T04:18:15Z

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.2516%. Comparing base (b104014) to head (c44d306).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #66590        +/-   ##
================================================
+ Coverage   77.6886%   78.2516%   +0.5630%     
================================================
  Files          2007       1939        -68     
  Lines        549532     539835      -9697     
================================================
- Hits         426924     422430      -4494     
+ Misses       120947     116958      -3989     
+ Partials       1661        447      -1214

Flag	Coverage Δ
integration	`44.0848% <69.6969%> (-4.0981%)`	⬇️
unit	`76.6852% <93.9393%> (+0.3438%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.7767% <ø> (-12.1017%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hawkingrei · 2026-02-28T04:21:12Z

I suggest attaching benchmark data for scenarios such as partitioned tables and concurrent ANALYZE to prove that removing the pool won't cause performance issues.

pkg/statistics/BUILD.bazel

coderabbitai

🧹 Nitpick comments (1)

pkg/statistics/fmsketch.go (1)
238-244: Memory calculation is an approximation; consider documenting this.

The comment states each map entry consumes 8 bytes, but Go maps have additional overhead (bucket structure, tophash bytes, load factor). The actual memory per entry is typically higher (~15-25 bytes depending on load factor).

If this is intentionally a rough estimate for memory tracking, consider updating the comment to clarify it's an approximation, e.g., "approximately 8 bytes per key (excluding map overhead)".
📝 Suggested comment clarification
 // MemoryUsage returns the total memory usage of a FMSketch.
 func (s *FMSketch) MemoryUsage() (sum int64) {
 	// As for the variables mask(uint64) and maxSize(int) each will consume 8 bytes. This is the origin of the constant 16.
-	// And for the variables hashset(map[uint64]struct{}), each element in map will consume 8 bytes(uint64 key).
+	// And for the variables hashset(map[uint64]struct{}), we estimate 8 bytes per entry (key size only, excluding Go map overhead).
 	sum = int64(16 + 8*len(s.hashset))
 	return
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/statistics/fmsketch.go` around lines 238 - 244, Update the MemoryUsage
comment for method FMSketch.MemoryUsage to state that the returned value is an
approximation (e.g., "approximate: counts 8 bytes per uint64 key and 16 bytes
for mask/maxSize, excluding Go map overhead such as bucket/tophash structures"),
and mention that actual per-entry memory is typically higher depending on map
load factor; keep the calculation code (sum = int64(16 + 8*len(s.hashset)))
unchanged but clarify in the comment that the 8 bytes per entry excludes map
overhead and is a rough estimate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 238-244: Update the MemoryUsage comment for method
FMSketch.MemoryUsage to state that the returned value is an approximation (e.g.,
"approximate: counts 8 bytes per uint64 key and 16 bytes for mask/maxSize,
excluding Go map overhead such as bucket/tophash structures"), and mention that
actual per-entry memory is typically higher depending on map load factor; keep
the calculation code (sum = int64(16 + 8*len(s.hashset))) unchanged but clarify
in the comment that the 8 bytes per entry excludes map overhead and is a rough
estimate.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04d970b and 739429c.

📒 Files selected for processing (1)

pkg/statistics/fmsketch.go

mjonss · 2026-02-28T12:54:11Z

/retest

tiprow · 2026-02-28T12:54:36Z

@mjonss: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mjonss · 2026-02-28T13:07:30Z

I suggest attaching benchmark data for scenarios such as partitioned tables and concurrent ANALYZE to prove that removing the pool won't cause performance issues.

@hawkingrei Found this during my testing:

mjonss · 2026-02-28T13:13:02Z

A flame graph showing the bad performance:

And the history + comment from Claude Code:
The FMSketch sync.Pool was added in PR #47070 (Sep 19, 2023) using Go's builtin map[uint64]bool, where maps.Clear() is O(1) — the runtime just marks buckets as empty. The very next day, PR #47055 (Sep 20, 2023) by the same author switched the hashset to swiss.Map and replaced maps.Clear() with swiss.Map.Clear(), which iterates and zeros every bucket — O(capacity). With many partitions this made the pool actively harmful: CPU profiling of ANALYZE on a table with 8000 partitions showed swiss.Map.Clear() consuming 88% of all CPU time. The fix is to remove the pool entirely — the 24-byte struct isn't worth pooling when the map must be recreated on each reuse anyway.

0xPoe

Approve in advance! Nice work. I love to see it.

I have a few minor points and questions.

Could you please include the performance comparison results in the PR description? For example, compare the memory and CPU usage to before this change.

/hold

pkg/statistics/analyze.go

pkg/statistics/handle/storage/json.go

pkg/statistics/handle/globalstats/global_stats_async.go

pkg/statistics/handle/globalstats/global_stats.go

pkg/statistics/fmsketch.go

pkg/statistics/row_sampler.go

ti-chi-bot · 2026-02-28T21:47:00Z

[LGTM Timeline notifier]

Timeline:

2026-02-28 13:56:36.042809186 +0000 UTC m=+20840.620888420: ☑️ agreed by hawkingrei.
2026-02-28 21:46:59.228260627 +0000 UTC m=+49063.806339821: ☑️ agreed by 0xPoe.

- Clarify MemoryUsage comment (CodeRabbit) - Add "Release for GC." comments at nil-assignment sites (0xPoe) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/statistics/fmsketch.go (1)
238-243: Memory estimation is an approximation.

The 8 bytes per entry estimate covers only the key size. Go maps have additional overhead (buckets, overflow pointers, etc.) that scales with capacity, not just element count. This is acknowledged in the comment with "excluding Go map overhead." For a more accurate estimate, consider using a multiplier (e.g., ~2x) or documenting that this is a lower bound.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/statistics/fmsketch.go` around lines 238 - 243, The MemoryUsage()
estimate currently counts 8 bytes per map entry (key only) and notes it excludes
Go map overhead; update the implementation and/or comment in
FMSketch.MemoryUsage to reflect a more realistic estimate by either applying a
multiplier (e.g., 2x) to account for map bucket/overflow overhead or explicitly
state it is a lower bound; reference the variables mask, maxSize and hashset and
ensure the comment and returned calculation consistently indicate whether the
value is an approximate lower bound or a multiplied approximation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/statistics/handle/storage/json.go`:
- Around line 125-126: Inside the ForEachColumnImmutable callback you're
mutating the passed column by setting col.FMSketch = nil which violates the
"Don't change the content when calling this function" contract; remove that
assignment from the callback and instead either rely on GC or collect the column
identifiers/indices (or pointers) into a local slice and after
ForEachColumnImmutable returns iterate that slice to call cleanup/clear on
FMSketch (or set to nil) outside the iteration; keep the existing
hist.DestroyAndPutToPool() call in the callback but move any modification of
col.FMSketch to the post-iteration cleanup step.

---

Nitpick comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 238-243: The MemoryUsage() estimate currently counts 8 bytes per
map entry (key only) and notes it excludes Go map overhead; update the
implementation and/or comment in FMSketch.MemoryUsage to reflect a more
realistic estimate by either applying a multiplier (e.g., 2x) to account for map
bucket/overflow overhead or explicitly state it is a lower bound; reference the
variables mask, maxSize and hashset and ensure the comment and returned
calculation consistently indicate whether the value is an approximate lower
bound or a multiplied approximation.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 739429c and ef71132.

📒 Files selected for processing (6)

pkg/statistics/analyze.go
pkg/statistics/fmsketch.go
pkg/statistics/handle/globalstats/global_stats.go
pkg/statistics/handle/globalstats/global_stats_async.go
pkg/statistics/handle/storage/json.go
pkg/statistics/row_sampler.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/statistics/row_sampler.go

pkg/statistics/handle/storage/json.go

Add a standalone benchmark tool that measures FMSketch allocation, merge, serialization, and GC behavior across three phases matching the production data flow (region->partition, save/load, partition->global). Unlike testing.B this gives full control over GC timing and heap measurement. Build-tag gated destroy functions allow A/B comparison between master (pool+Clear) and PR (nil) on the same benchmark code. Includes benchmark results for 6 configurations (master/stash/PR x destroy/no-destroy) showing the PR is fastest overall at ~12s vs 14-20s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Re-ran PR benchmark after changing NewFMSketch to pre-allocate map to maxSize capacity. Phase 1 improved from 9.0s to 7.3s (-19%) by avoiding incremental map rehashing during merge. Total improved from 12.0s to 10.3s. Updated report with new numbers and consistent table ordering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add package-level testing.B benchmarks for FMSketch merge, encode/decode, and the old destroy path, plus a benchmark report summarizing results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eport" This reverts commit 4e8e934. This was just to share some previous results and benchmarks, not relevant!

…ocation" This reverts commit 4a249c6. Removing benchmark data and report, since only for reference, not for PR

This reverts commit 3729b07. Benchmarks for PR, not to be merged

mjonss · 2026-03-02T22:41:33Z

Performance report / Benchmark of wall clock and allocations

After playing around a lot and trying to collect profiles etc. during ANALYZE TABLE or having micro benchmarks tests, without resulting in any useful data for this PR, I created a simulation program to simulate the use of FMSketch during the three phases of ANALYZE TABLE for partitioned tables:

Region -> Partition FMSketch merge (same as for Region -> table for nonpartitioned) result 2X+ faster with the draw back of more allocation (GC pressure, which is accounted in the timing!)
Save/Load round-trips of per partition FMSketches ~ similar timing, more allocation (accounted for in timing)
Partition FMSketches -> Global Stats (full table) slightly worse, but only 5% of total time

So in total: The old implementation is 89% slower than this PR!

For repeatability, check the reverted commits in the commit history of this PR.

0xPoe · 2026-03-02T22:53:46Z

Performance report / Benchmark of wall clock and allocations

After playing around a lot and trying to collect profiles etc. during ANALYZE TABLE or having micro benchmarks tests, without resulting in any useful data for this PR, I created a simulation program to simulate the use of FMSketch during the three phases of ANALYZE TABLE for partitioned tables:

Region -> Partition FMSketch merge (same as for Region -> table for nonpartitioned) result 2X+ faster with the draw back of more allocation (GC pressure, which is accounted in the timing!)

Save/Load round-trips of per partition FMSketches ~ similar timing, more allocation (accounted for in timing)

Partition FMSketches -> Global Stats (full table) slightly worse, but only 5% of total time

So in total: The old implementation is 89% slower than this PR!

For repeatability, check the reverted commits in the commit history of this PR.

Great work！ 👍

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 197-200: The FMSketch is being constructed without initializing
its maxSize, leaving it zero and breaking threshold-based pruning; in the
FMSketchFromProto construction set the FMSketch.maxSize from the proto (use
protoSketch.MaxSize, falling back to len(protoSketch.Hashset) if MaxSize isn't
available) so that the invariant is local to FMSketch and pruning/merge logic
(methods on FMSketch) behave correctly.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef71132 and c44d306.

📒 Files selected for processing (1)

pkg/statistics/fmsketch.go

pkg/statistics/fmsketch.go

ti-chi-bot · 2026-03-02T23:51:32Z

@mjonss: You cannot manually add or delete the reviewing state labels, only I and the tursted members have permission to do so.

Details

In response to adding label named approved.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

bb7133

LGTM

ti-chi-bot · 2026-03-03T01:35:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xPoe, bb7133, hawkingrei, terry1purcell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [0xPoe,bb7133,hawkingrei,terry1purcell]
~~pkg/statistics/OWNERS~~ [0xPoe,terry1purcell]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

terry1purcell · 2026-03-03T03:49:21Z

/retest-required

tiprow · 2026-03-03T03:49:45Z

@terry1purcell: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels Feb 28, 2026

ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 28, 2026

ti-chi-bot bot added component/statistics sig/planner SIG: Planner labels Feb 28, 2026

ti-chi-bot bot removed the do-not-merge/needs-tests-checked label Feb 28, 2026

mjonss requested a review from Copilot February 28, 2026 03:27

Copilot started reviewing on behalf of mjonss February 28, 2026 03:27 View session

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Copilot AI reviewed Feb 28, 2026

View reviewed changes

pkg/statistics/handle/globalstats/global_stats_async.go Show resolved Hide resolved

pkg/statistics/fmsketch.go Outdated Show resolved Hide resolved

pkg/statistics/handle/globalstats/global_stats.go Show resolved Hide resolved

mjonss and others added 2 commits February 28, 2026 04:37

statistics: remove dead ReleaseAndPutToPool method

9fb8b13

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply suggestion from @Copilot

31bafce

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mjonss requested review from 0xPoe and hawkingrei February 28, 2026 03:44

statistics: update Bazel metadata after removing dolthub/swiss

04d970b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

pantheon-ai bot reviewed Feb 28, 2026

View reviewed changes

pkg/statistics/BUILD.bazel Show resolved Hide resolved

formatting

739429c

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

hawkingrei approved these changes Feb 28, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 28, 2026

0xPoe approved these changes Feb 28, 2026

View reviewed changes

ti-chi-bot bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 28, 2026

statistics: address review feedback for FMSketch simplification

ef71132

- Clarify MemoryUsage comment (CodeRabbit) - Add "Release for GC." comments at nil-assignment sites (0xPoe) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

pkg/statistics/handle/storage/json.go Show resolved Hide resolved

mjonss and others added 7 commits March 2, 2026 21:50

Removed the FMSketch capacity logic, just use what is given

14c9c62

statistics: add FMSketch testing.B benchmarks and benchmark report

4e8e934

Add package-level testing.B benchmarks for FMSketch merge, encode/decode, and the old destroy path, plus a benchmark report summarizing results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert "statistics: add FMSketch testing.B benchmarks and benchmark r…

f2e7303

…eport" This reverts commit 4e8e934. This was just to share some previous results and benchmarks, not relevant!

Revert "statistics: update benchmark results with maxSize map pre-all…

7205afb

…ocation" This reverts commit 4a249c6. Removing benchmark data and report, since only for reference, not for PR

Revert "statistics: add standalone FMSketch GC/memory/CPU benchmark"

c44d306

This reverts commit 3729b07. Benchmarks for PR, not to be merged

mjonss removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 2, 2026

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

pkg/statistics/fmsketch.go Show resolved Hide resolved

mjonss added the approved label Mar 2, 2026

ti-chi-bot bot removed the approved label Mar 2, 2026

terry1purcell approved these changes Mar 3, 2026

View reviewed changes

bb7133 approved these changes Mar 3, 2026

View reviewed changes

ti-chi-bot bot added the approved label Mar 3, 2026

ti-chi-bot bot merged commit 21b5bb6 into pingcap:master Mar 3, 2026
30 checks passed

Conversation

mjonss commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Benchmark results

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow bot commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

hawkingrei commented Feb 28, 2026

Uh oh!

codecov bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hawkingrei commented Feb 28, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mjonss commented Feb 28, 2026

Uh oh!

tiprow bot commented Feb 28, 2026

Uh oh!

mjonss commented Feb 28, 2026

Uh oh!

mjonss commented Feb 28, 2026

Uh oh!

0xPoe left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ti-chi-bot bot commented Feb 28, 2026

[LGTM Timeline notifier]

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mjonss commented Mar 2, 2026

Uh oh!

0xPoe commented Mar 2, 2026

Uh oh!

mjonss commented Feb 28, 2026 •

edited

Loading

pantheon-ai bot commented Feb 28, 2026 •

edited

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

codecov bot commented Feb 28, 2026 •

edited

Loading

0xPoe left a comment •

edited

Loading