[Refactor](Multi Catalog) External meta cache framework with engine adapters#60937
[Refactor](Multi Catalog) External meta cache framework with engine adapters#60937suxiaogang223 wants to merge 38 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run external |
|
run buildall |
d86d58c to
d3bbd5e
Compare
|
run buildall |
TPC-H: Total hot run time: 28796 ms |
There was a problem hiding this comment.
Code Review Summary — Unified External Metadata Cache Framework
Overall this is a well-structured refactoring that introduces a clean 3-level cache model (Engine → Catalog → Entry) with SPI-based plugin support. The migration of Iceberg/Paimon/Hudi/MaxCompute/Doris to engine-level adapters while keeping Hive on a legacy path is a sound incremental rollout strategy.
Critical Checkpoints
Goal & Correctness: The goal is to unify scattered per-engine cache managers into a single framework with consistent lifecycle, invalidation, and configuration. The framework structure accomplishes this, but there is a type-safety bug in the generic getSchemaCacheValue path (see inline comment) that would cause IllegalArgumentException at runtime for any engine registering a SchemaCacheKey subclass.
Concurrency: The framework relies on ConcurrentHashMap for engineCaches and CatalogEntryGroup, and Caffeine caches are inherently thread-safe. CatalogEntryGroup extends ConcurrentHashMap which is fine. The routeCatalogEngines broadcast pattern is safe because safeInvalidate catches IllegalStateException for uninitialized catalogs. No new lock hierarchy concerns introduced.
Lifecycle Management: CatalogMgr.removeCatalog() now calls catalog.onClose() before removing from maps and calls removeCatalog() on cache mgr before removing from idToCatalog — this is a correctness improvement ensuring cache cleanup while catalog is still accessible.
Configuration: CacheSpec handles enable/ttl/capacity with compatibility key mapping. Dynamic changes are supported through notifyPropertiesUpdated() which does removeCatalog + prepareCatalog.
Parallel Code Paths: The DefaultExternalMetaCache serves as fallback for engines without dedicated entries. Each ExternalTable subclass routes via getMetaCacheEngine(). The base ExternalTable.getMetaCacheEngine() throws by default — table types that forget to override will fail at runtime.
Test Coverage: Good unit test coverage for CacheSpec, MetaCacheEntry, SPI loading, Iceberg, and Paimon caches. Missing: No unit tests for HudiExternalMetaCache (the most complex engine cache with partition values, fs_view, and meta_client entries), MaxComputeExternalMetaCache, or DorisExternalMetaCache.
Performance: routeCatalogEngines broadcasts prepareCatalog to ALL 7 engines for every catalog, creating ~19 empty Caffeine cache instances per catalog regardless of catalog type. Memory impact is modest (~tens of KB per catalog) but scales linearly. Consider filtering by relevant engine or documenting this as intentional.
Observability: No new metrics or logging for cache hit/miss rates at the engine level, though Caffeine's built-in stats are exposed via MetaCacheEntry.getStats().
Issues Found
-
[Bug]
ensureTypeCompatibleuses strictClass.equals()which breaks when engine-specificSchemaCacheKeysubclasses are used through the generic path (see inline comment onExternalMetaCacheMgr.java:406) -
[Design]
routeCatalogEnginesbroadcasts to all engines unconditionally — wasteful but functionally safe (see inline comment on line 343) -
[Risk]
ExternalTable.getMetaCacheEngine()throws by default — any new table type that forgets to override will fail at runtime with no compile-time safety net (see inline comment onExternalTable.java:219) -
[Test Gap] No unit tests for HudiExternalMetaCache despite being the most complex engine cache implementation
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/datasource/metacache/AbstractExternalMetaCache.java
Show resolved
Hide resolved
|
run buildall |
TPC-H: Total hot run time: 27864 ms |
TPC-DS: Total hot run time: 154005 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 27849 ms |
TPC-DS: Total hot run time: 153786 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 27550 ms |
TPC-DS: Total hot run time: 152802 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 27606 ms |
TPC-DS: Total hot run time: 152510 ms |
What problem does this PR solve?
Part of #60686
Summary
Introduce a unified external metadata cache framework and migrate Iceberg / Paimon / Hudi / MaxCompute / Doris to engine-level cache adapters while keeping Common/Hive on the legacy path for incremental rollout.
New Meta Cache Framework
The new framework standardizes external metadata caching into a 3-level model:
ExternalMetaCacheMgrroutes by engine (hive/iceberg/paimon/hudi/maxcompute/doris).MetaCacheEntryDef(name, key type, value type, loader, default spec).Structure (simplified):
Core components:
ExternalMetaCache: unified engine-level contract (initCatalog, scoped invalidation,stats).AbstractExternalMetaCache: shared implementation for entry registration, per-catalog entry group creation, type-safe entry lookup, lifecycle management.MetaCacheEntryDef: immutable declaration of an entry.MetaCacheEntry: generic cache runtime (load on miss, invalidate by key/predicate/all, per-entry stats).CacheSpec: unified cache policy (enable,ttl-second,capacity) and compatibility key mapping.CatalogEntryGroup: container for all entries in one catalog.Initialization and lifecycle:
ExternalCatalog.makeSureInitialized(),ExternalMetaCacheMgr.prepareCatalog(...)eagerly initializes engine entries for that catalog.engine.entry -> metric map, so each entry can be observed independently.Configuration model:
meta.cache.<engine>.<entry>.enable|ttl-second|capacityMetaCacheEntryDefdefaultCacheSpec.CacheSpec.applyCompatibilityMap(...)supports smooth migration from legacy keys.ExternalMetaCacheMgr
engineCachesOrganizationengineCachesis a concurrent map:Map<String, ExternalMetaCache>.initEngineCaches()pre-registers built-in engines.engine(engineName)normalizes to lowercase and usescomputeIfAbsent(...).routeEngine(engine, action):engine == nullMigration Status (Engine View)
iceberg(table,view,manifest)paimon(table)hudi(partition,fs_view,meta_client)maxcompute(metadata)doris(backends)commonhiveKey Changes
datasource.metacache:ExternalMetaCacheAbstractExternalMetaCacheMetaCacheEntryDefMetaCacheEntryCacheSpecCatalogEntryGroupExternalMetaCacheMgrto route cache lifecycle by engine (prepareCatalog,invalidateCatalog/db/table/partitions,stats).ExternalCatalog.makeSureInitialized().IcebergExternalMetaCacheentries (table,view,manifest) and move call sites to the engine cache path.PaimonExternalMetaCache(table) and route related call sites via engine cache.HudiExternalMetaCacheentries (partition,fs_view,meta_client) and route scan/utils through the new path.MaxComputeExternalMetaCacheand removeMaxComputeMetadataCacheMgr.DorisExternalMetaCacheand removeDorisExternalMetaCacheMgr.CacheSpecfor gradual key migration.ENGINE_COMMONandENGINE_HIVEonLegacyExternalMetaCacheto preserve existing behavior.IcebergExternalMetaCacheTestPaimonExternalMetaCacheTestMetaCacheDeadlockTestCompatibility / Behavior
meta.cache.<engine>.<entry>.enablemeta.cache.<engine>.<entry>.ttl-secondmeta.cache.<engine>.<entry>.capacityCheck List (For Reviewer who merge this PR)