Skip to content

[refine](column) remove IDataType get_default#62582

Merged
yiguolei merged 3 commits intoapache:masterfrom
Mryange:remove-type-get_default
May 7, 2026
Merged

[refine](column) remove IDataType get_default#62582
yiguolei merged 3 commits intoapache:masterfrom
Mryange:remove-type-get_default

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Apr 17, 2026

What problem does this PR solve?

Problem Summary:

IDataType::get_default() duplicated the default-value behavior already owned by IColumn::insert_default(), but the two paths did not always produce the same result. This made default materialization inconsistent for complex and nested types, and forced several code paths to round-trip through Field even when the column layer already had the correct defaulting semantics.

This change removes the redundant type-level API and makes default generation consistently follow the column implementation.

Changes

  • Remove IDataType::get_default() from the BE data type interface and delete all corresponding overrides.
  • Rewrite create_column_const_with_default_value() to create a column and materialize its default value through insert_default().
  • Replace storage-side callers that used type->get_default() with direct column-level insert_default() calls.
  • Update unit tests to stop asserting the removed Field-based API and to validate defaults through column behavior instead.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 17, 2026

/review

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 17, 2026

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found after reviewing the full BE diff and the affected call paths.

Critical checkpoint conclusions:

  • Goal and correctness: The patch removes the redundant IDataType::get_default() API and routes default materialization through column insert_default() semantics. The updated create_column_const_with_default_value() and the storage-side replacements are consistent with that goal.
  • Scope and focus: The change set stays tightly focused on default-materialization code paths and the related unit tests.
  • Concurrency: Not applicable here; the modified code paths do not introduce new shared-state or lock interactions.
  • Lifecycle / static init: No special lifecycle or static-initialization concerns were introduced.
  • Config / compatibility: No config, protocol, serialization-format, or rolling-upgrade compatibility changes are involved.
  • Parallel code paths: The storage callers that previously round-tripped through Field now use column-level default insertion consistently with the central helper.
  • Conditional logic: No suspicious new conditional branches were added.
  • Testing: Unit tests were updated for the removed API and existing complex-type coverage still exercises create_column_const_with_default_value() for array/map/struct paths. I did not run tests in this review environment.
  • Observability / transactions / data writes: No new observability requirements. The storage-side replacements remain in branches whose comments and surrounding logic show the inserted values are either true defaults or intentionally unread filler values.
  • Performance: This is a small improvement by removing unnecessary Field materialization and reusing the column layer's native default path.

Residual risk: I reviewed the code and surrounding call chains, but I did not execute the BE unit suite locally in this runner.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (6/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.76% (27508/37292)
Line Coverage 57.48% (297924/518277)
Region Coverage 54.67% (247908/453432)
Branch Coverage 56.28% (107319/190673)

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 20, 2026

run buildall

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 20, 2026

run nonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (6/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.78% (27590/37393)
Line Coverage 57.52% (298369/518715)
Region Coverage 54.73% (248383/453803)
Branch Coverage 56.36% (107553/190817)

@zclllyybb
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  1. be/src/storage/tablet/base_tablet.cpp and be/src/storage/partial_update_info.cpp
    The new insert_default() calls can change committed row contents when partial update rebuilds a row whose previous version is delete-marked, or when flexible partial update does insert-after-delete. For non-null complex columns this flips the previous synthetic defaults to [], {}, and SQL NULL.
  2. be/src/core/data_type/data_type.cpp
    create_column_const_with_default_value() is a shared helper, so this is not limited to storage-only dead payloads. The updated regression output already shows json null -> NULL, [null] -> [], and {null:null} -> {}.

Critical Checkpoints

  • Goal: remove IDataType::get_default() without changing behavior. Conclusion: not met for Array/Map/JSONB defaults.
  • Test proof: no targeted test covers the resurrected-row paths above; the modified regression output instead demonstrates the semantic change.
  • Focus/minimality: the API removal is small, but changing the shared helper broadens the behavior change to parallel call paths.
  • Concurrency: no new concurrency issue found.
  • Lifecycle/static init: no special lifecycle issue found.
  • Config: no new config.
  • Compatibility: runtime behavior changes for existing partial-update flows.
  • Parallel paths: fixed partial update, flexible partial update, and shared helper callers are all affected.
  • Special conditions: the arbitrary values assumption is not valid on the resurrected-row paths.
  • Test result changes: the new .out is not behavior-neutral.
  • Observability: no additional observability needed.
  • Transaction/persistence/data correctness: affected, because the changed defaults can be written into committed rows.
  • FE-BE pass-through: no issue found.
  • Performance: neutral.
  • Other: none beyond the issues above.

Tests run: none locally.

ColumnPtr IDataType::create_column_const_with_default_value(size_t size) const {
return create_column_const(size, get_default());
auto column = create_column();
column->insert_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is no longer equivalent to the removed get_default() for complex types. For Array, Map, and JSONB, insert_default() yields [], {}, and SQL NULL instead of the previous synthetic defaults ([null], {null:null}, JSON null), and the regression diff in test_partial_update_complex_type.out already shows that behavior change. Because this helper is shared by many non-storage call sites, the PR is not behavior-preserving anymore.

->insert_default();
} else {
mutable_column->insert(rs_column.get_vec_type()->get_default());
mutable_column->insert_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not just filling dead payload. When a fixed partial update hits a previously delete-marked row, use_default becomes true and the new row is rebuilt from these values. Replacing get_default() with insert_default() therefore changes committed results for non-null complex columns from the previous synthetic defaults to [], {}, and SQL NULL.

new_col->insert_from(cur_col, block_pos);
} else {
new_col->insert(tablet_column.get_vec_type()->get_default());
new_col->insert_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue exists in flexible partial update. _probe_key_for_mow() sends insert-after-delete rows down the use_default path without calling handle_new_key(), so these values become the visible contents of the re-inserted row. With this change, skipped non-null complex columns now persist [], {}, and SQL NULL instead of the previous synthetic defaults.

@Mryange Mryange dismissed github-actions[bot]’s stale review April 21, 2026 02:02

回归里面就一个case有问题。
我们也没有一个文档来描述默认值导致是什么。

@Mryange Mryange force-pushed the remove-type-get_default branch from dda1bc2 to 34d8715 Compare May 6, 2026 08:21
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one additional issue beyond the existing inline threads. The newly added default-materialization unit test has a failing VARIANT expectation: after ColumnVariant::insert_default(), ColumnVariant::get() returns an empty TYPE_VARIANT field, not TYPE_NULL. This means the new test coverage does not currently pass as written.

I also reviewed the existing inline review context and did not duplicate the already-known blockers around the complex-type behavior changes in IDataType::create_column_const_with_default_value, fixed partial update, and flexible partial update. Those existing threads remain important because the PR is labeled as behavior-preserving/refactor-only but changes committed values and shared default-materialization semantics for JSONB/ARRAY/MAP paths.

Critical checkpoint conclusions: goal is to remove IDataType::get_default() and use column-level defaults, but the current code is not fully proven by passing tests and existing review threads show it changes behavior; scope is mostly focused, but storage call-site replacements are not behavior-preserving; no new concurrency or lifecycle hazards were found in this review; no new configs, persistence formats, or FE/BE protocol fields were added; parallel code paths in fixed/flexible partial update were touched, but existing threads identify semantic regressions there; test coverage was added, but the new unit test has a failing expectation and the regression output change documents a behavior change rather than a no-op refactor; no additional observability need was identified; no extra user focus points were provided, so there were no focus-specific findings beyond the full PR review.

Comment thread be/test/core/data_type/data_type_insert_default_test.cpp Outdated
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed. I did not add duplicate inline comments because the existing review threads already cover the blocking issues: the shared default helper is no longer behavior-preserving for complex types, the fixed and flexible partial-update paths can persist different values for insert-after-delete rows, and the new Variant default unit test still expects the wrong materialized field type. These should be resolved before merge.

Critical checkpoint conclusions:

  • Goal/test proof: The PR aims to remove IDataType::get_default() and make defaults column-driven, but the existing regression diff and already-open comments show this is a behavior change rather than a pure refactor. The new BE unit test is useful in scope but currently has an already-raised Variant expectation problem.
  • Scope/focus: The mechanical removal is small, but changing common default materialization affects shared helper users and storage write paths, so the effective behavior scope is broader than the PR description.
  • Concurrency/lifecycle/config/compatibility: I found no new concurrency, lifecycle, configuration, or wire/storage-format compatibility issue beyond the behavioral default-materialization concerns already raised.
  • Parallel paths: Both fixed and flexible partial-update paths were touched; existing threads already cover the distinct storage paths that can expose the changed defaults.
  • Tests/results: Test coverage was added, and regression output was updated, but the updated regression result itself demonstrates a user-visible result change that the PR marks as no behavior change. No additional user-provided review focus was present.
  • Observability/performance/transactions: No additional observability, performance, or transaction/persistence issue was found in this review pass.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review conclusion: request changes.

I did not add new inline comments because the blocking issues I verified are already covered by existing inline review threads and should not be duplicated. The unresolved blockers are: create_column_const_with_default_value() now materializes column defaults instead of the removed type-level defaults, which changes Array/Map/JSONB default values in shared non-storage call sites; fixed partial update and flexible partial update also persist the new empty/NULL complex defaults for insert-after-delete rows instead of the previous synthetic defaults; and the new Variant default unit-test expectation was already noted as inconsistent with ColumnVariant::insert_default()/ColumnVariant::get(). The current PR also reports BE UT (macOS) failed.

Critical checkpoint conclusions:

  • Goal and proof: the PR aims to remove IDataType::get_default() and use column default materialization consistently, but the implementation is not behavior-preserving while the PR states no behavior change, and test/check status does not prove correctness.
  • Scope/minimality: the API removal is broad but mechanically focused; however, replacing shared helper/storage semantics without compatibility or behavior acknowledgement is not safe.
  • Concurrency/lifecycle/config: no new concurrency, lifecycle, or configuration concerns were found in the changed code.
  • Compatibility/data correctness: there is user-visible/default-materialization behavior change for complex types and MoW partial-update reinsert paths; this remains the main blocker.
  • Parallel paths: fixed partial update, flexible partial update, and shared const-default helper paths were reviewed; the same semantic change exists across them and has already been commented.
  • Testing: a new BE unit test was added and a regression .out changed, but the PR checklist says no tests are needed, the current checks show BE UT failure, and the regression diff itself demonstrates changed behavior.
  • Observability/performance/memory: no additional observability, performance, or memory-tracking issues were identified for this refactor.

User focus points: no additional user-provided review focus was supplied.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

Automated review conclusion: request changes.

I did not add new inline comments because the blocking issues I verified are already covered by existing inline review threads and should not be duplicated. The unresolved blockers are: create_column_const_with_default_value() now materializes column defaults instead of the removed type-level defaults, which changes Array/Map/JSONB default values in shared non-storage call sites; fixed partial update and flexible partial update also persist the new empty/NULL complex defaults for insert-after-delete rows instead of the previous synthetic defaults; and the new Variant default unit-test expectation was already noted as inconsistent with ColumnVariant::insert_default()/ColumnVariant::get(). The current PR also reports BE UT (macOS) failed.

Critical checkpoint conclusions:

  • Goal and proof: the PR aims to remove IDataType::get_default() and use column default materialization consistently, but the implementation is not behavior-preserving while the PR states no behavior change, and test/check status does not prove correctness.
  • Scope/minimality: the API removal is broad but mechanically focused; however, replacing shared helper/storage semantics without compatibility or behavior acknowledgement is not safe.
  • Concurrency/lifecycle/config: no new concurrency, lifecycle, or configuration concerns were found in the changed code.
  • Compatibility/data correctness: there is user-visible/default-materialization behavior change for complex types and MoW partial-update reinsert paths; this remains the main blocker.
  • Parallel paths: fixed partial update, flexible partial update, and shared const-default helper paths were reviewed; the same semantic change exists across them and has already been commented.
  • Testing: a new BE unit test was added and a regression .out changed, but the PR checklist says no tests are needed, the current checks show BE UT failure, and the regression diff itself demonstrates changed behavior.
  • Observability/performance/memory: no additional observability, performance, or memory-tracking issues were identified for this refactor.

User focus points: no additional user-provided review focus was supplied.

create_column_const_with_default_value 没有问题。
Array/Map/JSONB 的默认值变更是合理的。

请重新review。

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR still is not behavior-preserving. Existing review threads already cover the shared helper behavior change, fixed partial update, the column-store flexible partial-update path, and the Variant unit-test expectation; I did not duplicate those. I found the same committed-value regression in two additional, distinct partial-update paths.

Critical checkpoint conclusions:

  • Goal/test proof: the stated goal is removing IDataType::get_default() without behavior change, but Array/Map/JSONB/Variant defaults differ between get_default() and insert_default(); the regression output is updated to the changed behavior instead of proving compatibility, and an existing thread notes the new Variant unit test expectation is wrong.
  • Scope/focus: the API removal is focused, but replacing storage write-path default materialization is not a safe mechanical refactor.
  • Concurrency/lifecycle/config: no new concurrency, lifecycle, or config risks found in the reviewed scope.
  • Compatibility/data correctness: the changed default materialization reaches MoW partial-update write paths and can change committed row contents for insert-after-delete rows.
  • Parallel paths: additional row-store and flexible publish paths need the same treatment as the already-commented fixed and column-store paths.
  • Tests: coverage currently documents changed results rather than preserving the previous committed behavior; the existing inline thread also identifies a failing Variant expectation.
  • Observability/performance/protocol: no additional issue found.

User focus: no additional user-provided focus points were supplied.

Comment thread be/src/storage/partial_update_info.cpp
Comment thread be/src/storage/tablet/base_tablet.cpp
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes.

I did not add new inline comments because the blocking issues I found are already covered by existing review threads and should not be duplicated. The unresolved blockers are the insert_default() behavior changes in the fixed/flexible partial-update paths for non-null complex columns, the shared create_column_const_with_default_value() behavior change for complex types, and the current DataTypeVariant default unit-test expectation mismatch.

Critical checkpoint conclusions:

  • Goal and tests: The stated goal is to remove IDataType::get_default() and centralize defaults through column insert_default(), but the current code does not preserve behavior in storage partial-update paths already identified in existing threads. Tests were added, but the Variant case is already known to expect the wrong materialization.
  • Scope/focus: The authoritative PR diff is focused on BE data type defaults and related storage/test updates. The behavior-changing storage call sites are not a safe mechanical replacement.
  • Concurrency/lifecycle: No additional concurrency, lock-order, or lifecycle issue was found in the authoritative PR diff.
  • Configuration/compatibility: No new configuration item or serialized format change was found. The API removal is internal C++ code.
  • Parallel paths: The same default-materialization issue exists across fixed partial update, flexible column-store, flexible row-store, and publish-time flexible paths; all are already covered by existing threads.
  • Tests: The regression output explicitly records changed committed values for complex partial-update rows; that supports the existing behavioral-regression comments rather than proving compatibility.
  • Observability/performance: No additional observability or performance issue found.
  • Data correctness/persistence: The identified partial-update behavior changes can affect committed row contents for MoW unique-key tables; no separate EditLog/persistence issue found.

User focus: no additional user-provided review focus was specified.

@Mryange Mryange dismissed github-actions[bot]’s stale review May 6, 2026 11:08

不要纠结行为变更

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

run buildall

1 similar comment
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 6, 2026

run buildall

@yiguolei yiguolei merged commit 4955aaf into apache:master May 7, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants