cudev: fix CUDA texture pitch alignment for createContinuous GpuMat#4068
cudev: fix CUDA texture pitch alignment for createContinuous GpuMat#4068Shubh3155 wants to merge 1 commit intoopencv:4.xfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a CUDA texture creation failure when using cv::cuda::createContinuous() with texture-backed operations. The issue occurs because createContinuous() can produce a GpuMat with a pitch that is not aligned to cudaDeviceProp::texturePitchAlignment, causing texture creation to fail.
Changes:
- Extended the fallback logic in texture creation to check pitch alignment in addition to single row/column cases
- When pitch is misaligned, data is copied into an aligned pitched buffer allocated via
cudaMallocPitch - Reformatted multi-line function calls for better readability
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cudaDeviceProp prop; | ||
| CV_CUDEV_SAFE_CALL(cudaGetDeviceProperties(&prop, currentDevice)); | ||
|
|
There was a problem hiding this comment.
Performance consideration: cudaGetDevice and cudaGetDeviceProperties are called every time this function is invoked. For frequently created textures, this could add overhead. Consider caching the device properties or checking if there's a pattern in the codebase for reusing device properties. However, this is a minor concern since texture creation is typically not done in a tight loop, and the fix is necessary for correctness.
| cudaDeviceProp prop; | |
| CV_CUDEV_SAFE_CALL(cudaGetDeviceProperties(&prop, currentDevice)); | |
| // Cache device properties per device to avoid repeated queries. | |
| static int cachedDevice = -1; | |
| static cudaDeviceProp cachedProp; | |
| static bool propInitialized = false; | |
| if (!propInitialized || currentDevice != cachedDevice) | |
| { | |
| CV_CUDEV_SAFE_CALL(cudaGetDeviceProperties(&cachedProp, currentDevice)); | |
| cachedDevice = currentDevice; | |
| propInitialized = true; | |
| } | |
| const cudaDeviceProp& prop = cachedProp; |
| @@ -176,17 +176,28 @@ namespace cv { namespace cudev { | |||
| texRes.res.pitch2D.height = rows; | |||
| texRes.res.pitch2D.width = cols; | |||
| // temporary fix for single row/columns until TexturePtr is reworked | |||
There was a problem hiding this comment.
The comment "temporary fix for single row/columns until TexturePtr is reworked" is now outdated. The fix now also handles pitch alignment issues, not just single row/column cases. Consider updating the comment to reflect the expanded scope, such as: "temporary fix for single row/columns and misaligned pitch until TexturePtr is reworked".
| // temporary fix for single row/columns until TexturePtr is reworked | |
| // temporary fix for single row/columns and misaligned pitch until TexturePtr is reworked |
| } | ||
| else { | ||
| texRes.res.pitch2D.devPtr = data; | ||
| texRes.res.pitch2D.pitchInBytes = step; | ||
| } |
There was a problem hiding this comment.
Inconsistent indentation: the closing brace and else statement use incorrect indentation. They should align with the opening brace on line 187, using 12 spaces instead of 11.
| } | |
| else { | |
| texRes.res.pitch2D.devPtr = data; | |
| texRes.res.pitch2D.pitchInBytes = step; | |
| } | |
| } | |
| else { | |
| texRes.res.pitch2D.devPtr = data; | |
| texRes.res.pitch2D.pitchInBytes = step; | |
| } |
Summary
This PR fixes a CUDA texture creation failure when using
cv::cuda::createContinuous().createContinuous()may produce aGpuMatwith a pitch that is not aligned tocudaDeviceProp::texturePitchAlignment. When such a matrix is used in CUDAtexture-backed operations (e.g.
cv::cuda::resize),createTextureObject()fails with an
invalid argumenterror.The fix extends the existing fallback logic to also handle misaligned pitch
values by copying the data into an aligned pitched buffer when required.
Root Cause
createContinuous()allocates memory viacudaMallocand reshapes it, whichcan result in an unaligned
step. CUDA Pitch2D textures require the pitch to bealigned to
texturePitchAlignment, but the previous logic only handledsingle-row or single-column cases.
Changes
GpuMatinstancesTesting
cv::cuda::resizeGpuMatallocations