-
Notifications
You must be signed in to change notification settings - Fork 277
feat: Add V2 scan support for native_datafusion #3276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR adds support for native_iceberg_compat scan implementation with V2 data sources (BatchScanExec with ParquetScan). ## Changes ### New Files - `CometNativeParquetPartitionReaderFactory`: A V2 partition reader factory that uses NativeBatchReader (DataFusion-based Parquet reader) - `CometNativeParquetScan`: A V2 scan trait that creates the new reader factory ### Behavior - V2 scans with `auto` or `native_iceberg_compat`: Use CometNativeParquetScan (DataFusion-based reader) - V2 scans with `native_comet`: Use existing CometParquetScan (legacy JNI-based BatchReader) - V2 scans with `native_datafusion`: Fall back to Spark (not yet supported for V2) This is consistent with the goal of deprecating the legacy BatchReader code - V2 scans can now use the DataFusion-based reader when using the recommended scan implementations. Co-Authored-By: Claude Opus 4.5 <[email protected]>
…scan impls The test was failing when the default scan impl was set to native_datafusion because V2 scans fall back to Spark in that case. This fix ensures the test explicitly sets SCAN_AUTO to test the V2 scan replacement behavior. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Previously, V2 Parquet scans with native_datafusion would fall back to Spark with a message saying it wasn't yet supported. This adds proper V2 scan support using the same DataFusion-based reader (CometNativeParquetScan) that native_iceberg_compat uses. The new nativeDataFusionV2Scan method: - Requires COMET_EXEC_ENABLED=true (same as other DataFusion-based V2 scans) - Validates schema, pushed aggregates, filesystem support, and encryption - Uses CometNativeParquetScan wrapped in CometBatchScanExec Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update test to expect CometBatchScan instead of fallback to Spark - Add ParquetReadV2NativeDataFusionSuite that runs all ParquetReadSuite tests (278+ tests) with V2 scans and native_datafusion implementation Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add ParquetReadV2NativeIcebergCompatSuite that runs all ParquetReadSuite tests (278+ tests) with V2 scans and explicit native_iceberg_compat setting. Note: ParquetReadV2Suite already provides implicit coverage via SCAN_AUTO, but this suite explicitly sets SCAN_NATIVE_ICEBERG_COMPAT for clarity and consistency with ParquetReadV2NativeDataFusionSuite. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3276 +/- ##
============================================
+ Coverage 56.12% 60.30% +4.18%
- Complexity 976 1497 +521
============================================
Files 119 177 +58
Lines 11743 16275 +4532
Branches 2251 2689 +438
============================================
+ Hits 6591 9815 +3224
- Misses 4012 5088 +1076
- Partials 1140 1372 +232 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm confused by all the references to |
This PR buids on #3272, so we need to merge that one first
Summary
native_datafusionscan implementationnative_datafusionwould fall back to SparkCometNativeParquetScan(DataFusion-based reader), same asnative_iceberg_compatTest plan
native_datafusionwhenCOMET_EXEC_ENABLED=trueCOMET_EXEC_ENABLED=falseCometScanRuleSuitetests pass🤖 Generated with Claude Code