Skip to content

Conversation

@HangyuanLiu
Copy link
Contributor

@HangyuanLiu HangyuanLiu commented Feb 12, 2026

Why I'm doing:

  • add_files populated Metrics.lowerBounds/upperBounds from parquet getMinBytes/getMaxBytes.
  • Those bytes are parquet physical encodings and are not guaranteed to match Iceberg logical encodings.
  • This could produce wrong file bounds and over-prune data files (for example, TPC-DS q21 on item.i_current_price DECIMAL(7,2))

What I'm doing:

  • Refactor extractParquetMetrics to aggregate row-group min/max as typed logical values.
  • Read stats via genericGetMin/genericGetMax, convert by Iceberg field type (tryConvertStatValue),
    compare with Comparators.forType, and serialize bounds with Conversions.toByteBuffer.
  • Apply the same logical path to all supported primitive types, not only DECIMAL.
  • Keep missing-stats handling by dropping unreliable bounds/null-count entries.
  • Add helper methods tryGetComparator and tryConvertStatValue, and extend decimal conversion for ByteBuffer input.
  • Add unit tests in AddFilesProcedureTest for decimal/stat value conversion paths.
    Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@HangyuanLiu HangyuanLiu requested a review from a team as a code owner February 12, 2026 01:45
@CelerData-Reviewer
Copy link

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Youngwb
Youngwb previously approved these changes Feb 12, 2026
@CelerData-Reviewer
Copy link

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e11e91e538

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +595 to +599
if (statValue instanceof Binary binary) {
return UUIDUtil.convert(binary.getBytes());
}
if (statValue instanceof byte[] bytes) {
return UUIDUtil.convert(bytes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Catch invalid UUID stats before converting bounds

UUIDUtil.convert(...) throws when the stats payload is not exactly 16 bytes, but this branch does not handle that exception; in extractParquetMetrics that propagates and fails the entire ADD FILES operation instead of treating the column bounds as missing. This is reachable when importing parquet files whose UUID column stats are malformed or schema-mismatched (for example legacy BINARY data mapped to UUID), so the conversion should be guarded and return null to let missingStats cleanup drop only unreliable metrics.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[FE Incremental Coverage Report]

pass : 119 / 119 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java 119 119 100.00% []

@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants