[refine](column) remove IDataType get_default#62582
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
No blocking issues found after reviewing the full BE diff and the affected call paths.
Critical checkpoint conclusions:
- Goal and correctness: The patch removes the redundant
IDataType::get_default()API and routes default materialization through columninsert_default()semantics. The updatedcreate_column_const_with_default_value()and the storage-side replacements are consistent with that goal. - Scope and focus: The change set stays tightly focused on default-materialization code paths and the related unit tests.
- Concurrency: Not applicable here; the modified code paths do not introduce new shared-state or lock interactions.
- Lifecycle / static init: No special lifecycle or static-initialization concerns were introduced.
- Config / compatibility: No config, protocol, serialization-format, or rolling-upgrade compatibility changes are involved.
- Parallel code paths: The storage callers that previously round-tripped through
Fieldnow use column-level default insertion consistently with the central helper. - Conditional logic: No suspicious new conditional branches were added.
- Testing: Unit tests were updated for the removed API and existing complex-type coverage still exercises
create_column_const_with_default_value()for array/map/struct paths. I did not run tests in this review environment. - Observability / transactions / data writes: No new observability requirements. The storage-side replacements remain in branches whose comments and surrounding logic show the inserted values are either true defaults or intentionally unread filler values.
- Performance: This is a small improvement by removing unnecessary
Fieldmaterialization and reusing the column layer's native default path.
Residual risk: I reviewed the code and surrounding call chains, but I did not execute the BE unit suite locally in this runner.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
run nonConcurrent |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Findings
be/src/storage/tablet/base_tablet.cppandbe/src/storage/partial_update_info.cpp
The newinsert_default()calls can change committed row contents when partial update rebuilds a row whose previous version is delete-marked, or when flexible partial update does insert-after-delete. For non-null complex columns this flips the previous synthetic defaults to[],{}, and SQLNULL.be/src/core/data_type/data_type.cpp
create_column_const_with_default_value()is a shared helper, so this is not limited to storage-only dead payloads. The updated regression output already showsjson null -> NULL,[null] -> [], and{null:null} -> {}.
Critical Checkpoints
- Goal: remove
IDataType::get_default()without changing behavior. Conclusion: not met for Array/Map/JSONB defaults. - Test proof: no targeted test covers the resurrected-row paths above; the modified regression output instead demonstrates the semantic change.
- Focus/minimality: the API removal is small, but changing the shared helper broadens the behavior change to parallel call paths.
- Concurrency: no new concurrency issue found.
- Lifecycle/static init: no special lifecycle issue found.
- Config: no new config.
- Compatibility: runtime behavior changes for existing partial-update flows.
- Parallel paths: fixed partial update, flexible partial update, and shared helper callers are all affected.
- Special conditions: the
arbitrary valuesassumption is not valid on the resurrected-row paths. - Test result changes: the new
.outis not behavior-neutral. - Observability: no additional observability needed.
- Transaction/persistence/data correctness: affected, because the changed defaults can be written into committed rows.
- FE-BE pass-through: no issue found.
- Performance: neutral.
- Other: none beyond the issues above.
Tests run: none locally.
| ColumnPtr IDataType::create_column_const_with_default_value(size_t size) const { | ||
| return create_column_const(size, get_default()); | ||
| auto column = create_column(); | ||
| column->insert_default(); |
There was a problem hiding this comment.
This helper is no longer equivalent to the removed get_default() for complex types. For Array, Map, and JSONB, insert_default() yields [], {}, and SQL NULL instead of the previous synthetic defaults ([null], {null:null}, JSON null), and the regression diff in test_partial_update_complex_type.out already shows that behavior change. Because this helper is shared by many non-storage call sites, the PR is not behavior-preserving anymore.
| ->insert_default(); | ||
| } else { | ||
| mutable_column->insert(rs_column.get_vec_type()->get_default()); | ||
| mutable_column->insert_default(); |
There was a problem hiding this comment.
This branch is not just filling dead payload. When a fixed partial update hits a previously delete-marked row, use_default becomes true and the new row is rebuilt from these values. Replacing get_default() with insert_default() therefore changes committed results for non-null complex columns from the previous synthetic defaults to [], {}, and SQL NULL.
| new_col->insert_from(cur_col, block_pos); | ||
| } else { | ||
| new_col->insert(tablet_column.get_vec_type()->get_default()); | ||
| new_col->insert_default(); |
There was a problem hiding this comment.
The same issue exists in flexible partial update. _probe_key_for_mow() sends insert-after-delete rows down the use_default path without calling handle_new_key(), so these values become the visible contents of the re-inserted row. With this change, skipped non-null complex columns now persist [], {}, and SQL NULL instead of the previous synthetic defaults.
回归里面就一个case有问题。
我们也没有一个文档来描述默认值导致是什么。
dda1bc2 to
34d8715
Compare
|
/review |
There was a problem hiding this comment.
I found one additional issue beyond the existing inline threads. The newly added default-materialization unit test has a failing VARIANT expectation: after ColumnVariant::insert_default(), ColumnVariant::get() returns an empty TYPE_VARIANT field, not TYPE_NULL. This means the new test coverage does not currently pass as written.
I also reviewed the existing inline review context and did not duplicate the already-known blockers around the complex-type behavior changes in IDataType::create_column_const_with_default_value, fixed partial update, and flexible partial update. Those existing threads remain important because the PR is labeled as behavior-preserving/refactor-only but changes committed values and shared default-materialization semantics for JSONB/ARRAY/MAP paths.
Critical checkpoint conclusions: goal is to remove IDataType::get_default() and use column-level defaults, but the current code is not fully proven by passing tests and existing review threads show it changes behavior; scope is mostly focused, but storage call-site replacements are not behavior-preserving; no new concurrency or lifecycle hazards were found in this review; no new configs, persistence formats, or FE/BE protocol fields were added; parallel code paths in fixed/flexible partial update were touched, but existing threads identify semantic regressions there; test coverage was added, but the new unit test has a failing expectation and the regression output change documents a behavior change rather than a no-op refactor; no additional observability need was identified; no extra user focus points were provided, so there were no focus-specific findings beyond the full PR review.
|
/review |
There was a problem hiding this comment.
Automated review completed. I did not add duplicate inline comments because the existing review threads already cover the blocking issues: the shared default helper is no longer behavior-preserving for complex types, the fixed and flexible partial-update paths can persist different values for insert-after-delete rows, and the new Variant default unit test still expects the wrong materialized field type. These should be resolved before merge.
Critical checkpoint conclusions:
- Goal/test proof: The PR aims to remove IDataType::get_default() and make defaults column-driven, but the existing regression diff and already-open comments show this is a behavior change rather than a pure refactor. The new BE unit test is useful in scope but currently has an already-raised Variant expectation problem.
- Scope/focus: The mechanical removal is small, but changing common default materialization affects shared helper users and storage write paths, so the effective behavior scope is broader than the PR description.
- Concurrency/lifecycle/config/compatibility: I found no new concurrency, lifecycle, configuration, or wire/storage-format compatibility issue beyond the behavioral default-materialization concerns already raised.
- Parallel paths: Both fixed and flexible partial-update paths were touched; existing threads already cover the distinct storage paths that can expose the changed defaults.
- Tests/results: Test coverage was added, and regression output was updated, but the updated regression result itself demonstrates a user-visible result change that the PR marks as no behavior change. No additional user-provided review focus was present.
- Observability/performance/transactions: No additional observability, performance, or transaction/persistence issue was found in this review pass.
|
/review |
There was a problem hiding this comment.
Automated review conclusion: request changes.
I did not add new inline comments because the blocking issues I verified are already covered by existing inline review threads and should not be duplicated. The unresolved blockers are: create_column_const_with_default_value() now materializes column defaults instead of the removed type-level defaults, which changes Array/Map/JSONB default values in shared non-storage call sites; fixed partial update and flexible partial update also persist the new empty/NULL complex defaults for insert-after-delete rows instead of the previous synthetic defaults; and the new Variant default unit-test expectation was already noted as inconsistent with ColumnVariant::insert_default()/ColumnVariant::get(). The current PR also reports BE UT (macOS) failed.
Critical checkpoint conclusions:
- Goal and proof: the PR aims to remove
IDataType::get_default()and use column default materialization consistently, but the implementation is not behavior-preserving while the PR states no behavior change, and test/check status does not prove correctness. - Scope/minimality: the API removal is broad but mechanically focused; however, replacing shared helper/storage semantics without compatibility or behavior acknowledgement is not safe.
- Concurrency/lifecycle/config: no new concurrency, lifecycle, or configuration concerns were found in the changed code.
- Compatibility/data correctness: there is user-visible/default-materialization behavior change for complex types and MoW partial-update reinsert paths; this remains the main blocker.
- Parallel paths: fixed partial update, flexible partial update, and shared const-default helper paths were reviewed; the same semantic change exists across them and has already been commented.
- Testing: a new BE unit test was added and a regression
.outchanged, but the PR checklist says no tests are needed, the current checks show BE UT failure, and the regression diff itself demonstrates changed behavior. - Observability/performance/memory: no additional observability, performance, or memory-tracking issues were identified for this refactor.
User focus points: no additional user-provided review focus was supplied.
create_column_const_with_default_value 没有问题。 请重新review。 |
|
/review |
There was a problem hiding this comment.
The PR still is not behavior-preserving. Existing review threads already cover the shared helper behavior change, fixed partial update, the column-store flexible partial-update path, and the Variant unit-test expectation; I did not duplicate those. I found the same committed-value regression in two additional, distinct partial-update paths.
Critical checkpoint conclusions:
- Goal/test proof: the stated goal is removing IDataType::get_default() without behavior change, but Array/Map/JSONB/Variant defaults differ between get_default() and insert_default(); the regression output is updated to the changed behavior instead of proving compatibility, and an existing thread notes the new Variant unit test expectation is wrong.
- Scope/focus: the API removal is focused, but replacing storage write-path default materialization is not a safe mechanical refactor.
- Concurrency/lifecycle/config: no new concurrency, lifecycle, or config risks found in the reviewed scope.
- Compatibility/data correctness: the changed default materialization reaches MoW partial-update write paths and can change committed row contents for insert-after-delete rows.
- Parallel paths: additional row-store and flexible publish paths need the same treatment as the already-commented fixed and column-store paths.
- Tests: coverage currently documents changed results rather than preserving the previous committed behavior; the existing inline thread also identifies a failing Variant expectation.
- Observability/performance/protocol: no additional issue found.
User focus: no additional user-provided focus points were supplied.
|
/review |
There was a problem hiding this comment.
Review result: request changes.
I did not add new inline comments because the blocking issues I found are already covered by existing review threads and should not be duplicated. The unresolved blockers are the insert_default() behavior changes in the fixed/flexible partial-update paths for non-null complex columns, the shared create_column_const_with_default_value() behavior change for complex types, and the current DataTypeVariant default unit-test expectation mismatch.
Critical checkpoint conclusions:
- Goal and tests: The stated goal is to remove
IDataType::get_default()and centralize defaults through columninsert_default(), but the current code does not preserve behavior in storage partial-update paths already identified in existing threads. Tests were added, but the Variant case is already known to expect the wrong materialization. - Scope/focus: The authoritative PR diff is focused on BE data type defaults and related storage/test updates. The behavior-changing storage call sites are not a safe mechanical replacement.
- Concurrency/lifecycle: No additional concurrency, lock-order, or lifecycle issue was found in the authoritative PR diff.
- Configuration/compatibility: No new configuration item or serialized format change was found. The API removal is internal C++ code.
- Parallel paths: The same default-materialization issue exists across fixed partial update, flexible column-store, flexible row-store, and publish-time flexible paths; all are already covered by existing threads.
- Tests: The regression output explicitly records changed committed values for complex partial-update rows; that supports the existing behavioral-regression comments rather than proving compatibility.
- Observability/performance: No additional observability or performance issue found.
- Data correctness/persistence: The identified partial-update behavior changes can affect committed row contents for MoW unique-key tables; no separate EditLog/persistence issue found.
User focus: no additional user-provided review focus was specified.
|
run buildall |
1 similar comment
|
run buildall |
What problem does this PR solve?
Problem Summary:
IDataType::get_default()duplicated the default-value behavior already owned byIColumn::insert_default(), but the two paths did not always produce the same result. This made default materialization inconsistent for complex and nested types, and forced several code paths to round-trip throughFieldeven when the column layer already had the correct defaulting semantics.This change removes the redundant type-level API and makes default generation consistently follow the column implementation.
Changes
IDataType::get_default()from the BE data type interface and delete all corresponding overrides.create_column_const_with_default_value()to create a column and materialize its default value throughinsert_default().type->get_default()with direct column-levelinsert_default()calls.Field-based API and to validate defaults through column behavior instead.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)