Skip to content

Replace std::clamp with device-compatible clamp_scalar in clamp backend#98

Open
rahmans1 wants to merge 1 commit intoacts-project:mainfrom
BNLNPPS:fix/clamp-device-code
Open

Replace std::clamp with device-compatible clamp_scalar in clamp backend#98
rahmans1 wants to merge 1 commit intoacts-project:mainfrom
BNLNPPS:fix/clamp-device-code

Conversation

@rahmans1
Copy link
Copy Markdown

@rahmans1 rahmans1 commented Apr 9, 2026

Replace std::clamp with device-compatible clamp_scalar in clamp backend

Problem

std::clamp is a constexpr host-only function. Calling it from a COVFIE_HOST_DEVICE (host device) context causes nvcc to emit warning #20013-D and produces silently incorrect results in device code (all outputs become zero) unless --expt-relaxed-constexpr is passed.

Minimal Demo

Verified on:
GPU: NVIDIA RTX 1000 Ada Generation
Driver: 581.42
CUDA: 13.0

// Minimal demo: why std::clamp fails in covfie device code,
// and why a ternary-based replacement works.
//
// Works (ternary):
//   nvcc -std=c++17 -o clamp_demo clamp_demo.cu && ./clamp_demo
//
// Reproduce the original covfie error (COVFIE_HOST_DEVICE + std::clamp):
//   nvcc -std=c++17 -DUSE_COVFIE_ORIGINAL -o clamp_demo clamp_demo.cu

#include <algorithm>
#include <cstdio>

#ifdef USE_COVFIE_ORIGINAL
// This mirrors what covfie's adjust() did originally:
// COVFIE_HOST_DEVICE (__host__ __device__) annotated, but calls std::clamp
// which is a __host__-only constexpr function.
// Without --expt-relaxed-constexpr, nvcc rejects the call to std::clamp
// from device code even through a __host__ __device__ wrapper.
template<typename S>
__host__ __device__ S clamp_fn(S v, S lo, S hi)
{
    return std::clamp(v, lo, hi);
}
#else
// The fix: ternary uses only native GPU compare/select, no stdlib call.
// Compiles and runs correctly with no special nvcc flags.
template<typename S>
__host__ __device__ S clamp_fn(S v, S lo, S hi)
{
    return v < lo ? lo : (v > hi ? hi : v);
}
#endif

__global__ void clamp_kernel(float* out)
{
    out[0] = clamp_fn(1.5f,  0.0f, 1.0f);  // above max → 1.0
    out[1] = clamp_fn(-0.5f, 0.0f, 1.0f);  // below min → 0.0
    out[2] = clamp_fn(0.5f,  0.0f, 1.0f);  // in range  → 0.5
}

int main()
{
    float* d;
    cudaMalloc(&d, 3 * sizeof(float));

    clamp_kernel<<<1, 1>>>(d);
    cudaDeviceSynchronize();

    float h[3];
    cudaMemcpy(h, d, 3 * sizeof(float), cudaMemcpyDeviceToHost);
    cudaFree(d);

    printf("clamp(1.5, 0, 1)  = %.1f  (expected 1.0)\n", h[0]);
    printf("clamp(-0.5, 0, 1) = %.1f  (expected 0.0)\n", h[1]);
    printf("clamp(0.5, 0, 1)  = %.1f  (expected 0.5)\n", h[2]);

    printf("\nHost check (via std::clamp):\n");
    printf("clamp(1.5, 0, 1)  = %.1f\n", std::clamp(1.5f,  0.0f, 1.0f));
    printf("clamp(-0.5, 0, 1) = %.1f\n", std::clamp(-0.5f, 0.0f, 1.0f));
    printf("clamp(0.5, 0, 1)  = %.1f\n", std::clamp(0.5f,  0.0f, 1.0f));

    return 0;
}
nvcc -std=c++17 -DUSE_COVFIE_ORIGINAL -o clamp_demo clamp_demo.cu
clamp_demo.cu(22): warning #20013-D: calling a constexpr __host__ function("clamp") from a __host__ __device__ function("clamp_fn") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
      return std::clamp(v, lo, hi);
             ^
          detected during instantiation of "S clamp_fn(S, S, S) [with S=float]" at line 36

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

clamp_demo.cu(22): warning #20013-D: calling a constexpr __host__ function("const T1 &  ::std::clamp<float> (const T1 &, const T1 &, const T1 &)") from a __host__ __device__ function("clamp_fn<float> ") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

jug_dev+> ubuntu@LPO-174581:~/celeritas-dd4hep/covfie$ ./clamp_demo
clamp(1.5, 0, 1)  = 0.0  (expected 1.0)
clamp(-0.5, 0, 1) = 0.0  (expected 0.0)
clamp(0.5, 0, 1)  = 0.0  (expected 0.5)

Host check (via std::clamp):
clamp(1.5, 0, 1)  = 1.0
clamp(-0.5, 0, 1) = 0.0
clamp(0.5, 0, 1)  = 0.5

Fix

Introduce a COVFIE_HOST_DEVICE static helper clamp_scalar() using the ternary pattern inside non_owning_data_t::adjust(), and update the pack-expansion call accordingly. Remove the now-unused #include .

nvcc -std=c++17 -o clamp_demo clamp_demo.cu && ./clamp_demo
clamp(1.5, 0, 1)  = 1.0  (expected 1.0)
clamp(-0.5, 0, 1) = 0.0  (expected 0.0)
clamp(0.5, 0, 1)  = 0.5  (expected 0.5)

Host check (via std::clamp):
clamp(1.5, 0, 1)  = 1.0
clamp(-0.5, 0, 1) = 0.0
clamp(0.5, 0, 1)  = 0.5

…backend

std::clamp is a constexpr __host__-only function. Calling it from a
COVFIE_HOST_DEVICE (__host__ __device__) context causes nvcc to emit
warning #20013-D and produces silently incorrect results in device code
(all outputs become zero) unless --expt-relaxed-constexpr is passed.

Verified on:
  GPU:    NVIDIA RTX 1000 Ada Generation
  Driver: 581.42
  CUDA:   13.0

Broken (mirrors original adjust() behaviour):

  template<typename S>
  __host__ __device__ S clamp_fn(S v, S lo, S hi)
  {
      return std::clamp(v, lo, hi);  // warning #20013-D; wrong at runtime
  }

  clamp(1.5, 0, 1)  = 0.0  (expected 1.0)  <- wrong
  clamp(-0.5, 0, 1) = 0.0  (expected 0.0)
  clamp(0.5, 0, 1)  = 0.0  (expected 0.5)  <- wrong

Fixed (ternary lowers to native GPU compare/select, no stdlib call):

  template<typename S>
  __host__ __device__ S clamp_fn(S v, S lo, S hi)
  {
      return v < lo ? lo : (v > hi ? hi : v);
  }

  clamp(1.5, 0, 1)  = 1.0  (expected 1.0)  ✓
  clamp(-0.5, 0, 1) = 0.0  (expected 0.0)  ✓
  clamp(0.5, 0, 1)  = 0.5  (expected 0.5)  ✓

Introduce a COVFIE_HOST_DEVICE static helper clamp_scalar() using the
ternary pattern inside non_owning_data_t::adjust(), and update the
pack-expansion call accordingly. Remove the now-unused #include <algorithm>.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rahmans1
Copy link
Copy Markdown
Author

rahmans1 commented Apr 9, 2026

@stephenswat Would you mind giving this a review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant