Skip to content

[Cpp API Compatibility] Fix MaybeResetHolder#78826

Open
youge325 wants to merge 4 commits intoPaddlePaddle:developfrom
youge325:cResetHolder
Open

[Cpp API Compatibility] Fix MaybeResetHolder#78826
youge325 wants to merge 4 commits intoPaddlePaddle:developfrom
youge325:cResetHolder

Conversation

@youge325
Copy link
Copy Markdown
Contributor

@youge325 youge325 commented Apr 28, 2026

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

本次修复了外部扩展场景下 TensorBase::storage() 未能同步 compat storage holder 的问题。

问题原因

paddle.enable_compat() 编译外部扩展时,phi::DenseTensor::ResetHolder() 不可见,原来的 MaybeResetHolder() 会退化到空操作,导致 DenseTensor 的 holder 仍然是原始 Allocation,没有被替换成 StorageHolderView。因此后续 resize_() 扩容虽然更新了 compat StorageImpl,但 tensor 再次读取 storage().nbytes() 时仍可能从旧 holder 得到旧容量。

修复内容

  • 保留 ResetHolder() 可见时的直接调用路径
  • 为 custom-kernel/外部扩展场景补充 fallback:
    • 通过公开 Holder() 返回的 live holder 引用替换为 compat holder。
    • 复刻 ResetHolder() 的关键语义:空 tensor 重置 offset;连续 tensor 校验 holder 容量。
  • 未修改 phi/core,改动限定在 compat 层。

验证

  • ninja -C build test/cpp/c10_storage_test test/cpp/ATen_resize_test 通过
  • ctest -R 'c10_storage_test|ATen_resize_test' --output-on-failure 通过
  • -DPADDLE_WITH_CUSTOM_KERNEL 语法验证通过
  • 外部扩展复现通过:resize_({4})storage().nbytes()8 更新为 16

上面的说明是 AI 生成的,大概意思是,paddle/phi/api/include/compat/ATen/core/TensorBase.h:416-422 中定义的MaybeResetHolder 模板实例化的过程失败了,因为在启用 PADDLE_WITH_CUSTOM_KERNEL 的情况下,paddle/phi/core/dense_tensor.h 不会 #include "paddle/phi/core/dense_tensor.inl" (详见 paddle/phi/core/dense_tensor.h:342-344 ),导致在 paddle/phi/core/dense_tensor.inl 中声明的 phi::DenseTensor::ResetHolder() 没有符号导出,调用 ResetHolder 函数的 MaybeResetHolder 函数自然就实例化失败了,现在就直接显式实例化一份 MaybeResetHolder,在里面将 ResetHolder 函数展开

这个问题比较隐蔽,在 C++ 模板编程的 SFINAE(Substitution Failure Is Not An Error)机制中,编译不会报错,导致 resize_ 函数调用了一个空的 MaybeResetHolder ,现已修复

是否引起精度变化

Copilot AI review requested due to automatic review settings April 28, 2026 10:52
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 28, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot Bot added the contributor External developers label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an external-extension compatibility issue where TensorBase::storage() could observe stale capacity because the compat StorageHolderView was not synced back into phi::DenseTensor when ResetHolder() is not available (e.g., custom-kernel / external builds).

Changes:

  • Replaces the MaybeResetHolder(...) no-op fallback with a phi::DenseTensor*-specific implementation that updates the live holder via Holder().
  • Mirrors key ResetHolder() semantics in the fallback: empty-tensor offset reset and contiguous-tensor capacity validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread paddle/phi/api/include/compat/ATen/core/TensorBase.h Outdated
Comment on lines +432 to +434
auto meta = dense->meta();
meta.offset = 0;
dense->set_meta(meta);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the numel() == 0 branch, updating offset via dense->set_meta(meta) can throw when meta.dims.size() == -1 and meta.strides.size() == -1, because set_meta will call DenseTensorMeta::calc_strides(meta_.dims) and calc_strides does not handle dims.size() == -1. ResetHolder() doesn’t have this failure mode. Consider resetting the offset without going through set_meta (e.g., mutate meta().offset via a guarded const_cast, or ensure strides/dims are set to a supported rank before calling set_meta).

Suggested change
auto meta = dense->meta();
meta.offset = 0;
dense->set_meta(meta);
auto& meta =
const_cast<phi::DenseTensorMeta&>(dense->meta());
meta.offset = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +440 to +441
TORCH_CHECK(holder != nullptr,
"The size of Holder is not enough to store the Tensor.");
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null-holder check uses the same message as the size check ("The size of Holder is not enough..."). If holder is actually null, this error is misleading. Suggest using a distinct message for the null case (e.g., "Holder must not be null") and keep the size-related message for the required_bytes <= holder->size() check.

Suggested change
TORCH_CHECK(holder != nullptr,
"The size of Holder is not enough to store the Tensor.");
TORCH_CHECK(holder != nullptr, "Holder must not be null");

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants