Comprehensive fix for Windows MSVC build errors (C2872 std, LNK2001) and thread-safety by munder-sa · Pull Request #355 · thu-ml/SageAttention

munder-sa · 2026-03-20T02:56:26Z

This PR provides a comprehensive and robust fix for compiling SageAttention and SageAttention3 on Windows with MSVC, addressing critical issues not fully resolved in previous attempts (such as PR #352).

Key Fixes:

1. Reliable fix for the small macro conflict (bool char error)
When <windows.h> is included (often implicitly via PyTorch's rpcndr.h), it incorrectly defines #define small char. This turns bool is_small or bool small in PyTorch's c10/cuda/CUDACachingAllocator.h into bool char, causing compilation failures. Simply passing -Usmall to nvcc_flags is insufficient because the macro gets redefined after the command line arguments are processed.
Fix: Explicitly added #undef small in api.cu and fp4_quantization_4d.cu right after <windows.h> inclusions.

2. Fixed TORCH_EXTENSION_NAME overwrite bug during concurrent builds (LNK2001 error)
In setup.py and sageattention3_blackwell/setup.py, the CXX_FLAGS and NVCC_FLAGS lists were previously shared across all CUDAExtension definitions. During parallel compilations (e.g., using pip install -e .), torch.utils.cpp_extension modifies extra_compile_args['cxx'] in-place by appending -DTORCH_EXTENSION_NAME={name}. This caused race conditions where the extension names were overwritten for earlier extensions (e.g., _qattn_sm80 receiving _fused instead), resulting in unresolved external symbols (LNK2001: PyInit__qattn_sm80) during linking.
Fix: Passed shallow copies of the flag lists (e.g., CXX_FLAGS[:]) to each CUDAExtension to ensure thread-safety.

3. Resolved error C2872: 'std': ambiguous symbol reliably
The ambiguous std resolution from compiled_autograd.h with MSVC is safely bypassed by appending -DUSE_CUDA, /Zc:preprocessor, and /DCCCL_IGNORE_MSVC_TRADITIONAL_PREPROCESSOR_WARNING unconditionally for sys.platform == "win32" (without relying on DISTUTILS_USE_SDK==1, ensuring it works uniformly across standard Windows terminal environments).

These changes have been thoroughly tested locally and successfully build all .whl binaries on Windows 11 with Python 3.12, PyTorch 2.10.0, and CUDA 13.1.

munder-sa added 3 commits March 20, 2026 11:33

Fix Windows MSVC build and concurrency issues

f97babd

Fix Windows rpcndr.h 'small' macro conflict in sageattn3

9195f3b

Update README with Windows Build Fixes info and Wheel requirements

3ecb520

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comprehensive fix for Windows MSVC build errors (C2872 std, LNK2001) and thread-safety#355

Comprehensive fix for Windows MSVC build errors (C2872 std, LNK2001) and thread-safety#355
munder-sa wants to merge 3 commits intothu-ml:mainfrom
munder-sa:main

munder-sa commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

munder-sa commented Mar 20, 2026

Key Fixes:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant