Skip to content

feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal#222

Open
weng271190436 wants to merge 4 commits intoopenworm:ow-0.9.9from
weng271190436:weiweng/modernize-makefile-osx
Open

feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal#222
weng271190436 wants to merge 4 commits intoopenworm:ow-0.9.9from
weng271190436:weiweng/modernize-makefile-osx

Conversation

@weng271190436
Copy link
Copy Markdown

@weng271190436 weng271190436 commented Apr 11, 2026

makefile.OSX references old Python version not available on modern Mac.

Port GPU kernels to Metal because Github CI doesn't have GPU so can't find OpenCL device on ARM Mac.

#223 #226

Comment thread makefile.OSX

CPP_DEPS = $(OBJECTS:.o=.d)

# Change these to set the different python directories
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old Python 2 stuff doesn't exist on modern mac

Comment thread makefile.OSX
echo "Error: no Python interpreter found (expected python3 or python in PATH)."; \
exit 1; \
fi; \
echo "Creating virtual environment at $(VENV_DIR)"; \
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create venv if missing under projectroot/venv to separate Python environment from host

Comment thread makefile.OSX
PYTHONHEADERDIR = /opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/
PYTHONLIBDIR = /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
PYTHONFRAMEWORKDIR = /Library/Frameworks/
PYTHON_CFLAGS := $(shell if [ -n "$(PYTHON)" ]; then "$(PYTHON)" -c "import sysconfig; print('-I'+sysconfig.get_paths()['include'])"; fi)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set these variables based on actual Python interpreters used rather than hardcoding to 2.7

Comment thread inc/OpenCL/cl.hpp
Comment thread inc/OpenCL/cl.hpp Outdated
{
#if defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || \
defined(_M_X64)
_mm_mfence();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mm_mfence is only available on Intel machines

Comment thread makefile.OSX
HOST_PYTHON := $(shell command -v python3 2>/dev/null || command -v python 2>/dev/null)
PYTHON ?= $(if $(wildcard $(VENV_PYTHON)),$(VENV_PYTHON),$(HOST_PYTHON))

CXX ?= $(shell command -v g++ 2>/dev/null || command -v clang++ 2>/dev/null)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If g++ not available fall back to clang

#include <iostream>
#include <stdexcept>
#include <iomanip>
#include <sys/time.h>
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gettimeofday comes from this header

@weng271190436 weng271190436 marked this pull request as ready for review April 12, 2026 02:00
Comment thread makefile.OSX
@echo 'Invoking: clang C++ Compiler'
#### use this to compile against homebrew installed python
#### change version number as necessary (2.7.n)
g++ -std=c++14 -O3 -Wall -c -I$(PYTHONHEADERDIR) -I$(INCDIR) -framework OpenCL -fmessage-length=0 -MMD -MP -MF"$(@:%.o=%.d)" -MT"$(@:%.o=%.d)" -o "$@" "$<"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of -framework OpenCL is to address this warning

clang++: warning: -framework OpenCL: 'linker' input unused [-Wunused-command-line-argument]

@weng271190436
Copy link
Copy Markdown
Author

@pgleeson can you review this please?

@pgleeson
Copy link
Copy Markdown
Member

Thanks @weng271190436. We had previously thought it wouldn't be possible to run on ARM Mac due to lack of support for OpenCL, and a switch to Metal would be required. So you have it bulding and running on your Mac?

Could you add a step in here https://github.com/openworm/sibernetic/blob/ow-0.9.9/.github/workflows/ci-build.yml for testing on macos-latest?

@weng271190436
Copy link
Copy Markdown
Author

Thanks @pgleeson. Added a macos-latest step in ci-build.yml

@pgleeson
Copy link
Copy Markdown
Member

Thanks @weng271190436, however the macos test is failing here with:

Unfortunately OpenCL couldn't find device ALL
OpenCL try to init existing device 
Error in owOpenCLSolver.cpp!!
ERROR: Sibernetic can't find any OpenCL devices. Please check you're environment configuration.

@weng271190436 weng271190436 force-pushed the weiweng/modernize-makefile-osx branch 2 times, most recently from 848f9e7 to 7f5c74b Compare April 25, 2026 23:47
@weng271190436 weng271190436 force-pushed the weiweng/modernize-makefile-osx branch from c9d2b00 to 12f6f6c Compare April 26, 2026 00:57
Comment thread inc/owMetalSolver.h
@@ -0,0 +1,107 @@
#pragma once
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a header similar to owOpenCLSolver.h

Comment thread inc/owOpenCLSolver.h

// OpenCL solver class
class owOpenCLSolver {
class owOpenCLSolver : public owSolver {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impose similar structure on both owOpenCLSolver and owMetalSolver using owSolver such that owPhysicsFluidSimulator.cpp can usd OpenCL implementation on Linux and Metal implementation on Mac

@weng271190436 weng271190436 force-pushed the weiweng/modernize-makefile-osx branch from 12f6f6c to 8234fdb Compare April 26, 2026 05:27
}
} // namespace

MetalBackend::MetalBackend(const char *libraryPath) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetalBackend handles common logic needed for every Metal kernel invocation.

With MetalBackend, each kernel invocation becomes a backend_->dispatch(kernelName, ...) call in owMetalSolver.cpp

@@ -0,0 +1,36 @@
#pragma once

// Kernel argument abstraction for the `clearBuffers` kernel.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these **Kernel.h files captures Metal argument struct and kernel name constant for one GPU kernel. I plan to define OpenCL argument struct here and create GPU backend agnostic input struct here in subsequent PRs. By defining a library agnostic input struct, I can write kernel unit tests that work with any GPU backend so that I can verify that the Metal kernel and OpenCL kernel produce the same output given the same input. After that, I (or any contributor) can refactor the kernels for potentially better readability or better performance without needing to worry about breaking correctness.

I am already doing this in my fork https://github.com/weng271190436/sibernetic/blob/main/tests/clear_buffers/clear_buffers_test_common.h


// ============ Metal helpers ============

inline void bindBuffer(MTL::ComputeCommandEncoder *enc, MTL::Buffer *buf,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I place kernel arg binding helpers in this file.

Comment thread src/metal/sphFluid.metal
@@ -0,0 +1,1479 @@
#include <metal_stdlib>
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite sphFluid.cl in Metal. Each OpenCL kernel has a Metal equivalent. I do write the Metal kernels a little differently so that it is easier for me to understand. And hopefully it is easier to understand for other contributors.

For example, I try to name all variables intuitively and consistently. I make an effort to distinguish serialId (i.e. the original particle ID defined by the input) and sortedParticleId (i.e. the particle ID after sorted by cell ID). I also pull all physical constants to top of the file and give them meaningful names (e.g.

// Reduced viscosity coefficient for worm-fluid interface interactions.
// 10x lower than default to allow worm to slide through the medium.
constant float kViscosityCoeffWormFluid = 1.0e-5f;

)

Also add comments in many places where I don't find the code unintuitive.

Comment thread src/owMetalSolver.cpp
@@ -0,0 +1,551 @@
#include "owMetalSolver.h"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the methods in owMetalSolver.cpp has an OpenCL equivalent.

@@ -0,0 +1,47 @@
//-------------------------------------------------------------------------------------------------------------------------------------------------------------
Copy link
Copy Markdown
Author

@weng271190436 weng271190436 Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything under inc/Metal are cpp headers provided by Apple. In README.md I added instruction on how to download these from Apple website so that people can upgrade these later when Apple releases a new version.

@weng271190436 weng271190436 changed the title feat: Make Sibernetic build work for ARM Mac and update README feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal Apr 26, 2026
@weng271190436
Copy link
Copy Markdown
Author

Hi @pgleeson, thank you for testing.

I realized that OpenCL worked for me locally because my Macbook has GPU but the Github Action VM doesn't have GPU. And OpenCL on ARM Mac can find GPU device but not CPU device.

As you have pointed out, for ARM Mac CI to work, I have to port GPU kernels to Metal. I have been doing that in my personal fork anyway so I add those changes into this PR to make CI work. Tested in my personal fork that build-macos works https://github.com/weng271190436/sibernetic/actions/runs/24949167863/job/73056091793 (let me know if you don't have permission to access my CI result).

However, this makes PR big, especially that I am pulling in the Metal cpp library from Apple. Let me know if you want to discuss how to break this Metal port into multiple smaller PRs to make review easier. But if you want to merge something that works end to end in one go, we can keep this PR.

slarson added a commit that referenced this pull request May 2, 2026
Mirrors the build-macos job from PR #222 (Wei Weng's Metal port against
ow-0.9.9), adapted for this branch:

- Drives the build through ./setup.sh instead of bare `make -f
  makefile.OSX`. Our binary embeds Python and imports torch/taichi/numpy
  at runtime, so the venv must exist before any backend=taichi* or
  backend=torch run. Also doubles as a meta-test that setup.sh works on
  a clean machine — exactly the silent-failure mode we hit when Homebrew
  rolled `python` forward to 3.14.
- Runs an explicit backend=taichi-cpu smoke test (deterministic, fast,
  proves the embedded interpreter can find taichi in the venv).
- Adds a continue-on-error backend=taichi (Metal) probe so we get
  evidence of whether Taichi-Metal works on macos-latest without
  blocking the gate while we find out.
- Adds workflow_dispatch: trigger to allow manual re-runs from the
  Actions UI while iterating.

Out of scope: Linux numpy include path bug (separate PR), the legacy
Intel workflow's quarantine, and reference-log/parity tests on macOS
(Phase 3 of the broader test-harness plan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
slarson added a commit that referenced this pull request May 2, 2026
…rk Python

CI surfaced this on macos-latest after the previous numpy fix unblocked
compilation: linking died with `ld: framework 'Python' not found` even
though the include path under Python.framework/Versions/3.13/include
clearly resolved during the compile phase.

Root cause: Homebrew's python@3.13 ships the framework headers and the
versioned dylib under Python.framework/Versions/3.13/, but deliberately
omits the top-level Python.framework/Python symlink that
`-framework Python` looks up at link time (so it doesn't conflict with
other Python framework installs). PR #222's macOS job only worked
because it didn't override PYTHONFRAMEWORKDIR — it picked up the
runner's pre-installed Python which has the symlink. Pinning to
Homebrew's Python (which we need for venv + ABI consistency with
torch/taichi) puts us in the broken-symlink case.

- makefile.OSX: parameterize the Python linker flags as PYTHON_LIB_FLAGS
  with `?=` so it can be overridden from the env, defaulting to the
  existing `-framework Python` for callers who already had a working
  framework.
- setup.sh: derive PYTHON_LIB_FLAGS from `<BUILD_PY>-config --embed
  --ldflags` (e.g. `-L<libdir> -lpython3.13 -ldl -framework
  CoreFoundation`), which links against libpython3.13.dylib by name and
  doesn't depend on the framework symlink.

Verified locally: build links cleanly with the new flags and the
binary runs `backend=taichi-cpu` on demo1 successfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
slarson added a commit that referenced this pull request May 3, 2026
Two empirical findings from today that update the open questions in
the previous DEVELOPMENT_LOG entry:

1. dt audit on PR #222 native Metal: the lever is essentially closed
   with default PCISPH max_iteration=3. Sweep:
     2e-5 (baseline)  ✅ 75 sec for 1-sec sim, retention 118%
     3e-5 (1.5×)      ❌ hangs at step ~7 (PCISPH didn't converge)
     4e-5 (2×)        ❌ hangs
     5e-5 (2.5×)      ❌ hangs
   To open dt up, the realistic options are bumping PCISPH iterations
   (cheapest, code change), switching to a different pressure solver
   like IISPH/DFSPH/WCSPH (substantial), or testing a smaller
   scenario. The <3-min target on a 5-sec sim is NOT reachable via dt
   alone given current solver settings.

2. Taichi-CUDA pancake check at 1-sec sim: the bug is algorithmic, not
   Apple-Silicon-specific. On L4 CUDA, Taichi's cube didn't move at all
   (mean_y unchanged from initial 44.42, vs OpenCL/PR222 both reaching
   ~10 in same sim time). Same root-cause "forces too weak" as the
   Apple Silicon Taichi-Metal pancake, just at a different magnitude.
   Fixing taichi_solver.py is a single change that benefits both Metal
   and CUDA simultaneously — the README's documented 3-step
   coordinate-scale fix is the starting hypothesis.

Both findings update the "Concrete next steps" checklist; dt audit and
Taichi-CUDA verification are now done. Native CUDA backend and the
cross-backend cube-stability regression remain open.
slarson added a commit that referenced this pull request May 3, 2026
Lays out the structure for a native CUDA backend that mirrors PR #222's
native Metal port. The actual kernel ports (translating sphFluid.cl's
1515 lines to sphFluid.cu CUDA C++) are deferred — they're ~2 weeks of
focused work and depend on PR #222's owSolver abstract base landing
first to avoid a refactor.

What this commit provides:

  src/cuda/README.md        Implementation plan (5 phases, ~2 weeks
                            estimated, with file-by-file mapping to
                            PR #222's Metal port for structural cribbing)
  src/cuda/sphFluid.cu      Skeleton with __global__ kernel signatures
                            (TODO bodies). One per OpenCL kernel in
                            sphFluid.cl. Includes an explicit reminder
                            that pcisphComputeElasticForces must keep
                            elastic forces in WORLD coordinates (this
                            is where the Taichi pancake bug originates;
                            CUDA port should match OpenCL exactly).
  inc/owCudaSolver.h        Public C++ interface mirroring
                            owOpenCLSolver.h. Method signatures
                            commented out so this header doesn't try
                            to declare functions whose definitions
                            don't exist yet.

What this commit does NOT do:
  - Touch makefile, makefile.OSX — they don't reference these files,
    existing builds are unaffected.
  - Add backend=cuda to owConfigProperty — would require a real
    implementation to dispatch to.
  - Attempt actual CUDA kernel implementation — that's Phase 1 of the
    work plan in src/cuda/README.md.

Reasoning: this gives the next developer (or a future LLM session) a
clear target architecture without committing to a half-baked port. When
PR #222 lands, the CUDA work can proceed against the same owSolver
abstract base + src/kernels/ descriptor pattern that PR #222 introduces,
rather than being structurally divergent from the Metal half.
slarson added a commit that referenced this pull request May 3, 2026
Updates the next-steps checklist to reflect this session's progress:
both the cross-backend regression script and the CUDA backend scaffold
have landed. CUDA kernel implementation remains a separate ~2-week task,
gated on PR #222 merging so we build against its owSolver abstract base
and src/kernels/ descriptor pattern rather than diverging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants