[feat](condition cache) Support condition cache for external table#60897
[feat](condition cache) Support condition cache for external table#60897jacktengg wants to merge 13 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 28620 ms |
TPC-DS: Total hot run time: 184197 ms |
|
run p0 |
|
run external |
|
run buildall |
TPC-H: Total hot run time: 28745 ms |
TPC-DS: Total hot run time: 184276 ms |
6781f14 to
15fff1b
Compare
|
run buildall |
TPC-H: Total hot run time: 29158 ms |
TPC-DS: Total hot run time: 183955 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
da407e2 to
a1f386e
Compare
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 28659 ms |
TPC-DS: Total hot run time: 184191 ms |
|
run buildall |
TPC-H: Total hot run time: 28635 ms |
TPC-DS: Total hot run time: 183659 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
c33c87b to
ee1fcce
Compare
|
run buildall |
|
/review |
ee1fcce to
14748e2
Compare
|
run buildall |
There was a problem hiding this comment.
Code Review: [feat](condition cache) Support condition cache for external table
Summary
This PR adds a condition cache for external tables (Hive Parquet/ORC). The cache stores per-granule (2048-row chunks) filter results keyed by (file_path, modification_time, file_size, condition_digest). On cache HIT, readers skip granules with no surviving rows, avoiding I/O and predicate evaluation. On MISS, readers mark which granules have surviving rows during predicate evaluation for future lookups.
Overall this is a well-structured feature with good test coverage. A few issues found below.
Critical Checkpoint Conclusions
Goal & correctness: The code accomplishes the stated goal. Regression tests cover both ORC and Parquet paths. Unit tests cover ranges_exception, get_row_index_by_pos, and filter_ranges_by_cache. Cache is correctly skipped for LIMIT-truncated scans and tables with delete operations (Iceberg/transactional Hive).
Concurrency: The ConditionCacheContext is created per-file and used by a single scanner thread sequentially, so no concurrency issues with the std::vector<bool> filter_result. The global ConditionCacheManager uses LRUCachePolicy which has internal locking.
Lifecycle: Cache entries are inserted only on full file consumption (_finalize_reader_condition_cache), preventing partial/incorrect cache entries. The ConditionCacheContext is owned by FileScanner and passed by raw pointer to readers, which is fine since the scanner outlives its readers.
Parallel code paths: Both ORC and Parquet readers are updated. Both lazy-read and non-lazy-read paths handle HIT and MISS. The Iceberg and transactional Hive table format readers correctly delegate cache context methods and report has_delete_operations() to disable caching.
Configuration: Uses existing enable_condition_cache and condition_cache_capacity session variables. No new config items added.
Observability: Profile counters added for ConditionCacheHit, ConditionCacheFilteredRows, and ConditionCacheMissedRows.
Compatibility: The condition_cache_digest is passed via TQueryOptions from FE. BE checks __isset.condition_cache_digest before use, so rolling upgrades where FE is newer than BE will gracefully skip caching.
Test coverage: Regression tests cover basic HIT/MISS scenarios for both formats. Unit tests cover the new RowRanges methods and cache filtering logic. Consider adding negative tests for edge cases (empty files, single-row files, files smaller than one granule).
Performance: On HIT, the Parquet non-lazy path pre-filters _read_ranges to skip I/O entirely for false granules. The ORC path seeks past false granules. The lazy-read HIT check in Parquet is dead code (see comment below) but harmless.
Key encoding: ExternalCacheKey::encode() concatenates variable-length path with fixed 24-byte binary suffix without a length separator. In practice, file paths won't contain bytes that could be confused with the binary suffix, but a length prefix or separator would be more robust.
Issues Found
See inline comments below.
| return -1; | ||
| } | ||
|
|
||
| uint64_t get_digest(uint64_t seed) const { |
There was a problem hiding this comment.
[Bug risk] DCHECK(false) vanishes in RELEASE builds, causing silent return of -1 for out-of-bounds positions. Per AGENTS.md coding standards, invariant violations should crash rather than silently continue.
If pos >= _count is truly an invariant that should never be violated, this should use DORIS_CHECK(false) (or equivalently, DORIS_CHECK(pos < _count) at the function entry) to catch violations in all build types. A silent return of -1 in production could cause incorrect granule marking and data correctness issues.
// Suggested: replace DCHECK with DORIS_CHECK
DORIS_CHECK(false) << "pos " << pos << " is out of bounds for RowRanges with count " << _count;| std::shared_ptr<ConditionCacheContext> _condition_cache_ctx; | ||
| std::unique_ptr<ORCFilterImpl> _orc_filter; | ||
| orc::RowReaderOptions _row_reader_options; | ||
|
|
There was a problem hiding this comment.
[Style] Initializing uint64_t to -1 gives UINT64_MAX (18446744073709551615). While it's overwritten before first use, this is a code smell and may trigger -Wconversion warnings in files that enable compile_check_begin.h.
Consider using 0 as initial value, or a named sentinel constant if a specific "uninitialized" value is intended.
| auto& cache = *_condition_cache_ctx->filter_result; | ||
| int64_t first_rg_pos = _read_ranges.get_row_index_by_pos(batch_base_row); | ||
| int64_t last_rg_pos = | ||
| _read_ranges.get_row_index_by_pos(batch_base_row + pre_read_rows - 1); |
There was a problem hiding this comment.
[Dead code] This lazy-read HIT check (if (_condition_cache_ctx && _condition_cache_ctx->is_hit)) will never evaluate to true for Parquet.
In _filter_read_ranges_by_condition_cache() (called during row group init), _condition_cache_ctx is set to nullptr after filtering _read_ranges on HIT. So by the time execution reaches this lazy-read path, _condition_cache_ctx is already null.
The actual HIT optimization for Parquet is entirely handled by pre-filtering _read_ranges in the non-lazy path, which is correct and effective. This block (lines 616-641 approximately) is unreachable dead code.
Consider either:
- Removing this dead code block, or
- If there's an intent to do per-batch cache checking in the lazy path (e.g., for future use), don't null out
_condition_cache_ctxin_filter_read_ranges_by_condition_cache()and document the design.
| RETURN_IF_ERROR(_read_column_data(block, _lazy_read_ctx.predicate_columns.first, batch_size, | ||
| &pre_read_rows, &pre_eof, filter_map)); | ||
| if (pre_read_rows == 0) { | ||
| DCHECK_EQ(pre_eof, true); |
There was a problem hiding this comment.
[Performance observation] In this lazy-read HIT path, predicate columns are already read from storage (_read_column_data above) before the cache check occurs. This means the cache HIT in the lazy-read path only saves the cost of:
- Predicate evaluation
- Lazy (non-predicate) column reads
But the I/O cost for predicate columns is already spent. This contrasts with the non-lazy path which pre-filters _read_ranges to avoid reading any columns for false granules.
This is noted as a design observation. Since this code is actually dead (see other comment about _condition_cache_ctx being null at this point), it's not a practical issue currently. But if this code path is ever activated, consider restructuring to check the cache before reading predicate columns.
|
run buildall |
TPC-H: Total hot run time: 28825 ms |
TPC-DS: Total hot run time: 183460 ms |
|
run beut |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 27646 ms |
TPC-DS: Total hot run time: 153330 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)