feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal#222
feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal#222weng271190436 wants to merge 4 commits intoopenworm:ow-0.9.9from
Conversation
|
|
||
| CPP_DEPS = $(OBJECTS:.o=.d) | ||
|
|
||
| # Change these to set the different python directories |
There was a problem hiding this comment.
Old Python 2 stuff doesn't exist on modern mac
| echo "Error: no Python interpreter found (expected python3 or python in PATH)."; \ | ||
| exit 1; \ | ||
| fi; \ | ||
| echo "Creating virtual environment at $(VENV_DIR)"; \ |
There was a problem hiding this comment.
Create venv if missing under projectroot/venv to separate Python environment from host
| PYTHONHEADERDIR = /opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/ | ||
| PYTHONLIBDIR = /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ | ||
| PYTHONFRAMEWORKDIR = /Library/Frameworks/ | ||
| PYTHON_CFLAGS := $(shell if [ -n "$(PYTHON)" ]; then "$(PYTHON)" -c "import sysconfig; print('-I'+sysconfig.get_paths()['include'])"; fi) |
There was a problem hiding this comment.
Set these variables based on actual Python interpreters used rather than hardcoding to 2.7
| { | ||
| #if defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || \ | ||
| defined(_M_X64) | ||
| _mm_mfence(); |
There was a problem hiding this comment.
_mm_mfence is only available on Intel machines
| HOST_PYTHON := $(shell command -v python3 2>/dev/null || command -v python 2>/dev/null) | ||
| PYTHON ?= $(if $(wildcard $(VENV_PYTHON)),$(VENV_PYTHON),$(HOST_PYTHON)) | ||
|
|
||
| CXX ?= $(shell command -v g++ 2>/dev/null || command -v clang++ 2>/dev/null) |
There was a problem hiding this comment.
If g++ not available fall back to clang
| #include <iostream> | ||
| #include <stdexcept> | ||
| #include <iomanip> | ||
| #include <sys/time.h> |
There was a problem hiding this comment.
gettimeofday comes from this header
| @echo 'Invoking: clang C++ Compiler' | ||
| #### use this to compile against homebrew installed python | ||
| #### change version number as necessary (2.7.n) | ||
| g++ -std=c++14 -O3 -Wall -c -I$(PYTHONHEADERDIR) -I$(INCDIR) -framework OpenCL -fmessage-length=0 -MMD -MP -MF"$(@:%.o=%.d)" -MT"$(@:%.o=%.d)" -o "$@" "$<" |
There was a problem hiding this comment.
The removal of -framework OpenCL is to address this warning
clang++: warning: -framework OpenCL: 'linker' input unused [-Wunused-command-line-argument]
|
@pgleeson can you review this please? |
|
Thanks @weng271190436. We had previously thought it wouldn't be possible to run on ARM Mac due to lack of support for OpenCL, and a switch to Metal would be required. So you have it bulding and running on your Mac? Could you add a step in here https://github.com/openworm/sibernetic/blob/ow-0.9.9/.github/workflows/ci-build.yml for testing on macos-latest? |
|
Thanks @pgleeson. Added a macos-latest step in ci-build.yml |
|
Thanks @weng271190436, however the macos test is failing here with: |
848f9e7 to
7f5c74b
Compare
c9d2b00 to
12f6f6c
Compare
| @@ -0,0 +1,107 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
This is a header similar to owOpenCLSolver.h
|
|
||
| // OpenCL solver class | ||
| class owOpenCLSolver { | ||
| class owOpenCLSolver : public owSolver { |
There was a problem hiding this comment.
Impose similar structure on both owOpenCLSolver and owMetalSolver using owSolver such that owPhysicsFluidSimulator.cpp can usd OpenCL implementation on Linux and Metal implementation on Mac
12f6f6c to
8234fdb
Compare
| } | ||
| } // namespace | ||
|
|
||
| MetalBackend::MetalBackend(const char *libraryPath) { |
There was a problem hiding this comment.
MetalBackend handles common logic needed for every Metal kernel invocation.
With MetalBackend, each kernel invocation becomes a backend_->dispatch(kernelName, ...) call in owMetalSolver.cpp
| @@ -0,0 +1,36 @@ | |||
| #pragma once | |||
|
|
|||
| // Kernel argument abstraction for the `clearBuffers` kernel. | |||
There was a problem hiding this comment.
Each of these **Kernel.h files captures Metal argument struct and kernel name constant for one GPU kernel. I plan to define OpenCL argument struct here and create GPU backend agnostic input struct here in subsequent PRs. By defining a library agnostic input struct, I can write kernel unit tests that work with any GPU backend so that I can verify that the Metal kernel and OpenCL kernel produce the same output given the same input. After that, I (or any contributor) can refactor the kernels for potentially better readability or better performance without needing to worry about breaking correctness.
I am already doing this in my fork https://github.com/weng271190436/sibernetic/blob/main/tests/clear_buffers/clear_buffers_test_common.h
|
|
||
| // ============ Metal helpers ============ | ||
|
|
||
| inline void bindBuffer(MTL::ComputeCommandEncoder *enc, MTL::Buffer *buf, |
There was a problem hiding this comment.
I place kernel arg binding helpers in this file.
| @@ -0,0 +1,1479 @@ | |||
| #include <metal_stdlib> | |||
There was a problem hiding this comment.
Rewrite sphFluid.cl in Metal. Each OpenCL kernel has a Metal equivalent. I do write the Metal kernels a little differently so that it is easier for me to understand. And hopefully it is easier to understand for other contributors.
For example, I try to name all variables intuitively and consistently. I make an effort to distinguish serialId (i.e. the original particle ID defined by the input) and sortedParticleId (i.e. the particle ID after sorted by cell ID). I also pull all physical constants to top of the file and give them meaningful names (e.g.
// Reduced viscosity coefficient for worm-fluid interface interactions.
// 10x lower than default to allow worm to slide through the medium.
constant float kViscosityCoeffWormFluid = 1.0e-5f;
)
Also add comments in many places where I don't find the code unintuitive.
| @@ -0,0 +1,551 @@ | |||
| #include "owMetalSolver.h" | |||
There was a problem hiding this comment.
Most of the methods in owMetalSolver.cpp has an OpenCL equivalent.
| @@ -0,0 +1,47 @@ | |||
| //------------------------------------------------------------------------------------------------------------------------------------------------------------- | |||
There was a problem hiding this comment.
Everything under inc/Metal are cpp headers provided by Apple. In README.md I added instruction on how to download these from Apple website so that people can upgrade these later when Apple releases a new version.
|
Hi @pgleeson, thank you for testing. I realized that OpenCL worked for me locally because my Macbook has GPU but the Github Action VM doesn't have GPU. And OpenCL on ARM Mac can find GPU device but not CPU device. As you have pointed out, for ARM Mac CI to work, I have to port GPU kernels to Metal. I have been doing that in my personal fork anyway so I add those changes into this PR to make CI work. Tested in my personal fork that build-macos works https://github.com/weng271190436/sibernetic/actions/runs/24949167863/job/73056091793 (let me know if you don't have permission to access my CI result). However, this makes PR big, especially that I am pulling in the Metal cpp library from Apple. Let me know if you want to discuss how to break this Metal port into multiple smaller PRs to make review easier. But if you want to merge something that works end to end in one go, we can keep this PR. |
Mirrors the build-macos job from PR #222 (Wei Weng's Metal port against ow-0.9.9), adapted for this branch: - Drives the build through ./setup.sh instead of bare `make -f makefile.OSX`. Our binary embeds Python and imports torch/taichi/numpy at runtime, so the venv must exist before any backend=taichi* or backend=torch run. Also doubles as a meta-test that setup.sh works on a clean machine — exactly the silent-failure mode we hit when Homebrew rolled `python` forward to 3.14. - Runs an explicit backend=taichi-cpu smoke test (deterministic, fast, proves the embedded interpreter can find taichi in the venv). - Adds a continue-on-error backend=taichi (Metal) probe so we get evidence of whether Taichi-Metal works on macos-latest without blocking the gate while we find out. - Adds workflow_dispatch: trigger to allow manual re-runs from the Actions UI while iterating. Out of scope: Linux numpy include path bug (separate PR), the legacy Intel workflow's quarantine, and reference-log/parity tests on macOS (Phase 3 of the broader test-harness plan). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rk Python CI surfaced this on macos-latest after the previous numpy fix unblocked compilation: linking died with `ld: framework 'Python' not found` even though the include path under Python.framework/Versions/3.13/include clearly resolved during the compile phase. Root cause: Homebrew's python@3.13 ships the framework headers and the versioned dylib under Python.framework/Versions/3.13/, but deliberately omits the top-level Python.framework/Python symlink that `-framework Python` looks up at link time (so it doesn't conflict with other Python framework installs). PR #222's macOS job only worked because it didn't override PYTHONFRAMEWORKDIR — it picked up the runner's pre-installed Python which has the symlink. Pinning to Homebrew's Python (which we need for venv + ABI consistency with torch/taichi) puts us in the broken-symlink case. - makefile.OSX: parameterize the Python linker flags as PYTHON_LIB_FLAGS with `?=` so it can be overridden from the env, defaulting to the existing `-framework Python` for callers who already had a working framework. - setup.sh: derive PYTHON_LIB_FLAGS from `<BUILD_PY>-config --embed --ldflags` (e.g. `-L<libdir> -lpython3.13 -ldl -framework CoreFoundation`), which links against libpython3.13.dylib by name and doesn't depend on the framework symlink. Verified locally: build links cleanly with the new flags and the binary runs `backend=taichi-cpu` on demo1 successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two empirical findings from today that update the open questions in the previous DEVELOPMENT_LOG entry: 1. dt audit on PR #222 native Metal: the lever is essentially closed with default PCISPH max_iteration=3. Sweep: 2e-5 (baseline) ✅ 75 sec for 1-sec sim, retention 118% 3e-5 (1.5×) ❌ hangs at step ~7 (PCISPH didn't converge) 4e-5 (2×) ❌ hangs 5e-5 (2.5×) ❌ hangs To open dt up, the realistic options are bumping PCISPH iterations (cheapest, code change), switching to a different pressure solver like IISPH/DFSPH/WCSPH (substantial), or testing a smaller scenario. The <3-min target on a 5-sec sim is NOT reachable via dt alone given current solver settings. 2. Taichi-CUDA pancake check at 1-sec sim: the bug is algorithmic, not Apple-Silicon-specific. On L4 CUDA, Taichi's cube didn't move at all (mean_y unchanged from initial 44.42, vs OpenCL/PR222 both reaching ~10 in same sim time). Same root-cause "forces too weak" as the Apple Silicon Taichi-Metal pancake, just at a different magnitude. Fixing taichi_solver.py is a single change that benefits both Metal and CUDA simultaneously — the README's documented 3-step coordinate-scale fix is the starting hypothesis. Both findings update the "Concrete next steps" checklist; dt audit and Taichi-CUDA verification are now done. Native CUDA backend and the cross-backend cube-stability regression remain open.
Lays out the structure for a native CUDA backend that mirrors PR #222's native Metal port. The actual kernel ports (translating sphFluid.cl's 1515 lines to sphFluid.cu CUDA C++) are deferred — they're ~2 weeks of focused work and depend on PR #222's owSolver abstract base landing first to avoid a refactor. What this commit provides: src/cuda/README.md Implementation plan (5 phases, ~2 weeks estimated, with file-by-file mapping to PR #222's Metal port for structural cribbing) src/cuda/sphFluid.cu Skeleton with __global__ kernel signatures (TODO bodies). One per OpenCL kernel in sphFluid.cl. Includes an explicit reminder that pcisphComputeElasticForces must keep elastic forces in WORLD coordinates (this is where the Taichi pancake bug originates; CUDA port should match OpenCL exactly). inc/owCudaSolver.h Public C++ interface mirroring owOpenCLSolver.h. Method signatures commented out so this header doesn't try to declare functions whose definitions don't exist yet. What this commit does NOT do: - Touch makefile, makefile.OSX — they don't reference these files, existing builds are unaffected. - Add backend=cuda to owConfigProperty — would require a real implementation to dispatch to. - Attempt actual CUDA kernel implementation — that's Phase 1 of the work plan in src/cuda/README.md. Reasoning: this gives the next developer (or a future LLM session) a clear target architecture without committing to a half-baked port. When PR #222 lands, the CUDA work can proceed against the same owSolver abstract base + src/kernels/ descriptor pattern that PR #222 introduces, rather than being structurally divergent from the Metal half.
Updates the next-steps checklist to reflect this session's progress: both the cross-backend regression script and the CUDA backend scaffold have landed. CUDA kernel implementation remains a separate ~2-week task, gated on PR #222 merging so we build against its owSolver abstract base and src/kernels/ descriptor pattern rather than diverging.
makefile.OSX references old Python version not available on modern Mac.
Port GPU kernels to Metal because Github CI doesn't have GPU so can't find OpenCL device on ARM Mac.
#223 #226