Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 26, 2026

This PR buids on #3272, so we need to merge that one first

Summary

  • Adds V2 Parquet scan support for native_datafusion scan implementation
  • Previously, V2 scans with native_datafusion would fall back to Spark
  • Now uses CometNativeParquetScan (DataFusion-based reader), same as native_iceberg_compat

Test plan

  • Updated existing test to verify V2 scans work with native_datafusion when COMET_EXEC_ENABLED=true
  • Added new test to verify fallback to Spark when COMET_EXEC_ENABLED=false
  • All CometScanRuleSuite tests pass

🤖 Generated with Claude Code

andygrove and others added 5 commits January 25, 2026 09:38
This PR adds support for native_iceberg_compat scan implementation with V2 data sources (BatchScanExec with ParquetScan).

## Changes

### New Files
- `CometNativeParquetPartitionReaderFactory`: A V2 partition reader factory that uses NativeBatchReader (DataFusion-based Parquet reader)
- `CometNativeParquetScan`: A V2 scan trait that creates the new reader factory

### Behavior
- V2 scans with `auto` or `native_iceberg_compat`: Use CometNativeParquetScan (DataFusion-based reader)
- V2 scans with `native_comet`: Use existing CometParquetScan (legacy JNI-based BatchReader)
- V2 scans with `native_datafusion`: Fall back to Spark (not yet supported for V2)

This is consistent with the goal of deprecating the legacy BatchReader code - V2 scans can now use the DataFusion-based reader when using the recommended scan implementations.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…scan impls

The test was failing when the default scan impl was set to native_datafusion
because V2 scans fall back to Spark in that case. This fix ensures the test
explicitly sets SCAN_AUTO to test the V2 scan replacement behavior.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Previously, V2 Parquet scans with native_datafusion would fall back to
Spark with a message saying it wasn't yet supported. This adds proper
V2 scan support using the same DataFusion-based reader (CometNativeParquetScan)
that native_iceberg_compat uses.

The new nativeDataFusionV2Scan method:
- Requires COMET_EXEC_ENABLED=true (same as other DataFusion-based V2 scans)
- Validates schema, pushed aggregates, filesystem support, and encryption
- Uses CometNativeParquetScan wrapped in CometBatchScanExec

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update test to expect CometBatchScan instead of fallback to Spark
- Add ParquetReadV2NativeDataFusionSuite that runs all ParquetReadSuite
  tests (278+ tests) with V2 scans and native_datafusion implementation

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add ParquetReadV2NativeIcebergCompatSuite that runs all ParquetReadSuite
tests (278+ tests) with V2 scans and explicit native_iceberg_compat setting.

Note: ParquetReadV2Suite already provides implicit coverage via SCAN_AUTO,
but this suite explicitly sets SCAN_NATIVE_ICEBERG_COMPAT for clarity and
consistency with ParquetReadV2NativeDataFusionSuite.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 74.14634% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.30%. Comparing base (f09f8af) to head (93ce2bb).
⚠️ Report is 884 commits behind head on main.

Files with missing lines Patch % Lines
...n/scala/org/apache/comet/rules/CometScanRule.scala 57.77% 17 Missing and 21 partials ⚠️
...uet/CometNativeParquetPartitionReaderFactory.scala 87.34% 6 Missing and 4 partials ⚠️
.../apache/comet/parquet/CometNativeParquetScan.scala 86.11% 1 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3276      +/-   ##
============================================
+ Coverage     56.12%   60.30%   +4.18%     
- Complexity      976     1497     +521     
============================================
  Files           119      177      +58     
  Lines         11743    16275    +4532     
  Branches       2251     2689     +438     
============================================
+ Hits           6591     9815    +3224     
- Misses         4012     5088    +1076     
- Partials       1140     1372     +232     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich
Copy link
Contributor

mbutrovich commented Jan 26, 2026

I'm confused by all the references to native_iceberg_compat. In theory when I started prototyping this recently, this should borrow more from CometIcebergNativeScan since that's a V2 scan that then delegates to a fully native operator.

@andygrove andygrove closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants