Refactor Statistics Cache System #23564

aptend · 2026-01-20T11:04:08Z

Refactor Statistics Cache System

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

This PR refactors the statistics cache system to improve query optimization performance and accuracy. The changes introduce a two-tier caching architecture:

1. Session Stats Cache

Fast path: Returns cached stats within 3 seconds if valid (AccurateObjectNumber > 0)
Aggressive retry: Immediately recalculates invalid stats to ensure BVT tests get fresh data after table creation/insertion
Empty table handling: Returns nil for empty tables, allowing callers to use DefaultStats (Outcnt=1000)
Value-type cache: Uses map[uint64]StatsInfoWrapper to reduce small object allocations
Optimize BackExec: Reuse BackExec in one session to eliminate the overhead of repeated init.

2. Global Stats Cache

Event-driven updates: Asynchronously updated by Logtail events
Two-level filtering:
- Queue condition: keyExists + event type + large table throttling
- Execution condition: inProgress + MinUpdateInterval (15s)
Large table throttling:
- Small tables (< 500 objects): Update on any change
- Large tables (≥ 500 objects): Update when change rate ≥ 5% or timeout 30min
Disk-only stats: Only counts persisted objects, ignoring in-memory dirty blocks
Concurrent traversal: Uses concurrentExecutor for parallel object metadata loading

3. Sampling Statistics for Large Tables

Two-phase approach:
- Phase 1: Exact table-level stats (row count, block count) from ObjectStats (no IO)
- Phase 2: Sampled column-level stats (ZoneMap, NDV, NullCnt) from ObjectMeta (with IO)
Sampling strategy:
- ≤100 objects: Full scan
- >100 objects: targetCount = clamp(max(sqrt(N), 0.02·N), 100, 2000)
UUID-based sampling: Leverages UUIDv7's random bytes for zero-overhead deterministic sampling
Row-based scaling: Scales column stats by row count ratio to reduce object size variance impact

4. New Table Function: `table_stats()`

Provides SQL interface to query table statistics
Returns: table_name, row_count, block_count, object_count, stats_json
Supports optional refresh mode: 'auto' (default) or 'full' (force refresh)

Key Benefits:

Reduced S3 IO: Sampling reduces IO for large tables while maintaining accuracy
Better cache hit rate: Session cache provides fast path for repeated queries
Scalable updates: Large table throttling prevents excessive update frequency
Observability: New table_stats() function for debugging and monitoring

XuPeng-SH

Please check which Markdown documents were accidentally added

aptend requested review from XuPeng-SH, aunjgr, fengttt, gouhongshen, heni02 and ouyuanning as code owners January 20, 2026 11:04

aptend temporarily deployed to ci January 20, 2026 11:04 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 20, 2026 11:04 — with GitHub Actions Failure

aptend temporarily deployed to ci January 20, 2026 11:04 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 20, 2026 11:04 — with GitHub Actions Failure

aptend temporarily deployed to ci January 20, 2026 11:04 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 20, 2026 11:04 — with GitHub Actions Failure

matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label Jan 20, 2026

mergify bot added kind/bug Something isn't working kind/enhancement kind/refactor Code refactor labels Jan 20, 2026

aptend force-pushed the sess-stats branch from ebbe722 to 25f6f60 Compare January 21, 2026 03:20

aptend temporarily deployed to ci January 21, 2026 03:21 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 21, 2026 03:21 — with GitHub Actions Failure

aptend had a problem deploying to ci January 23, 2026 03:56 — with GitHub Actions Failure

aptend temporarily deployed to ci January 23, 2026 03:57 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 23, 2026 03:57 — with GitHub Actions Failure

aptend force-pushed the sess-stats branch from 1b732c6 to 5a343ae Compare January 23, 2026 10:49

aptend temporarily deployed to ci January 23, 2026 10:49 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 23, 2026 10:49 — with GitHub Actions Error

aptend temporarily deployed to ci January 23, 2026 10:49 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 23, 2026 10:49 — with GitHub Actions Error

refactor stats

43976de

aptend force-pushed the sess-stats branch from 5a343ae to 43976de Compare January 23, 2026 10:50

aptend temporarily deployed to ci January 23, 2026 10:51 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 23, 2026 10:51 — with GitHub Actions Failure

aptend temporarily deployed to ci January 23, 2026 10:51 — with GitHub Actions Inactive

aptend had a problem deploying to ci January 23, 2026 10:51 — with GitHub Actions Failure

aptend temporarily deployed to ci January 23, 2026 10:51 — with GitHub Actions Inactive

XuPeng-SH requested changes Jan 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Statistics Cache System #23564

Refactor Statistics Cache System #23564

Uh oh!

aptend commented Jan 20, 2026 •

edited

Loading

Uh oh!

XuPeng-SH left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor Statistics Cache System #23564

Are you sure you want to change the base?

Refactor Statistics Cache System #23564

Uh oh!

Conversation

aptend commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor Statistics Cache System

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

1. Session Stats Cache

2. Global Stats Cache

3. Sampling Statistics for Large Tables

4. New Table Function: table_stats()

Key Benefits:

Uh oh!

XuPeng-SH left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aptend commented Jan 20, 2026 •

edited

Loading

4. New Table Function: `table_stats()`