feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal by weng271190436 · Pull Request #222 · openworm/sibernetic

weng271190436 · 2026-04-11T23:50:42Z

makefile.OSX references old Python version not available on modern Mac.

Port GPU kernels to Metal because Github CI doesn't have GPU so can't find OpenCL device on ARM Mac.

weng271190436 · 2026-04-11T23:53:11Z


 CPP_DEPS = $(OBJECTS:.o=.d)

-# Change these to set the different python directories


Old Python 2 stuff doesn't exist on modern mac

weng271190436 · 2026-04-11T23:55:20Z

+			echo "Error: no Python interpreter found (expected python3 or python in PATH)."; \
+			exit 1; \
+		fi; \
+		echo "Creating virtual environment at $(VENV_DIR)"; \


Create venv if missing under projectroot/venv to separate Python environment from host

weng271190436 · 2026-04-11T23:58:06Z

-PYTHONHEADERDIR = /opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/
-PYTHONLIBDIR = /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
-PYTHONFRAMEWORKDIR = /Library/Frameworks/
+PYTHON_CFLAGS := $(shell if [ -n "$(PYTHON)" ]; then "$(PYTHON)" -c "import sysconfig; print('-I'+sysconfig.get_paths()['include'])"; fi)


Set these variables based on actual Python interpreters used rather than hardcoding to 2.7

weng271190436 · 2026-04-12T00:01:00Z

+    {
+#if defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || \
+    defined(_M_X64)
+        _mm_mfence();


_mm_mfence is only available on Intel machines

weng271190436 · 2026-04-12T00:01:58Z

+HOST_PYTHON := $(shell command -v python3 2>/dev/null || command -v python 2>/dev/null)
+PYTHON ?= $(if $(wildcard $(VENV_PYTHON)),$(VENV_PYTHON),$(HOST_PYTHON))
+
+CXX ?= $(shell command -v g++ 2>/dev/null || command -v clang++ 2>/dev/null)


If g++ not available fall back to clang

weng271190436 · 2026-04-12T00:03:15Z

 #include <iostream>
 #include <stdexcept>
 #include <iomanip>
+#include <sys/time.h>


gettimeofday comes from this header

weng271190436 · 2026-04-13T12:01:16Z

 	@echo 'Invoking: clang C++ Compiler'
 #### use this to compile against homebrew installed python
 #### change version number as necessary (2.7.n)
-	g++ -std=c++14 -O3 -Wall -c -I$(PYTHONHEADERDIR) -I$(INCDIR) -framework OpenCL -fmessage-length=0 -MMD -MP -MF"$(@:%.o=%.d)" -MT"$(@:%.o=%.d)" -o "$@" "$<"


The removal of -framework OpenCL is to address this warning

clang++: warning: -framework OpenCL: 'linker' input unused [-Wunused-command-line-argument]

weng271190436 · 2026-04-22T18:53:22Z

@pgleeson can you review this please?

pgleeson · 2026-04-23T08:10:55Z

Thanks @weng271190436. We had previously thought it wouldn't be possible to run on ARM Mac due to lack of support for OpenCL, and a switch to Metal would be required. So you have it bulding and running on your Mac?

Could you add a step in here https://github.com/openworm/sibernetic/blob/ow-0.9.9/.github/workflows/ci-build.yml for testing on macos-latest?

weng271190436 · 2026-04-24T01:18:36Z

Thanks @pgleeson. Added a macos-latest step in ci-build.yml

pgleeson · 2026-04-24T15:53:04Z

Thanks @weng271190436, however the macos test is failing here with:

Unfortunately OpenCL couldn't find device ALL
OpenCL try to init existing device 
Error in owOpenCLSolver.cpp!!
ERROR: Sibernetic can't find any OpenCL devices. Please check you're environment configuration.

weng271190436 · 2026-04-26T00:58:39Z

@@ -0,0 +1,107 @@
+#pragma once


This is a header similar to owOpenCLSolver.h

weng271190436 · 2026-04-26T01:00:19Z


 // OpenCL solver class
-class owOpenCLSolver {
+class owOpenCLSolver : public owSolver {


Impose similar structure on both owOpenCLSolver and owMetalSolver using owSolver such that owPhysicsFluidSimulator.cpp can usd OpenCL implementation on Linux and Metal implementation on Mac

weng271190436 · 2026-04-26T05:34:52Z

+}
+} // namespace
+
+MetalBackend::MetalBackend(const char *libraryPath) {


MetalBackend handles common logic needed for every Metal kernel invocation.

With MetalBackend, each kernel invocation becomes a backend_->dispatch(kernelName, ...) call in owMetalSolver.cpp

weng271190436 · 2026-04-26T05:51:14Z

@@ -0,0 +1,36 @@
+#pragma once
+
+// Kernel argument abstraction for the `clearBuffers` kernel.


Each of these **Kernel.h files captures Metal argument struct and kernel name constant for one GPU kernel. I plan to define OpenCL argument struct here and create GPU backend agnostic input struct here in subsequent PRs. By defining a library agnostic input struct, I can write kernel unit tests that work with any GPU backend so that I can verify that the Metal kernel and OpenCL kernel produce the same output given the same input. After that, I (or any contributor) can refactor the kernels for potentially better readability or better performance without needing to worry about breaking correctness.

I am already doing this in my fork https://github.com/weng271190436/sibernetic/blob/main/tests/clear_buffers/clear_buffers_test_common.h

weng271190436 · 2026-04-26T05:54:43Z

+
+// ============ Metal helpers ============
+
+inline void bindBuffer(MTL::ComputeCommandEncoder *enc, MTL::Buffer *buf,


I place kernel arg binding helpers in this file.

weng271190436 · 2026-04-26T06:07:28Z

@@ -0,0 +1,1479 @@
+#include <metal_stdlib>


Rewrite sphFluid.cl in Metal. Each OpenCL kernel has a Metal equivalent. I do write the Metal kernels a little differently so that it is easier for me to understand. And hopefully it is easier to understand for other contributors.

For example, I try to name all variables intuitively and consistently. I make an effort to distinguish serialId (i.e. the original particle ID defined by the input) and sortedParticleId (i.e. the particle ID after sorted by cell ID). I also pull all physical constants to top of the file and give them meaningful names (e.g.

// Reduced viscosity coefficient for worm-fluid interface interactions. // 10x lower than default to allow worm to slide through the medium. constant float kViscosityCoeffWormFluid = 1.0e-5f;

)

Also add comments in many places where I don't find the code unintuitive.

weng271190436 · 2026-04-26T06:08:36Z

@@ -0,0 +1,551 @@
+#include "owMetalSolver.h"


Most of the methods in owMetalSolver.cpp has an OpenCL equivalent.

weng271190436 · 2026-04-26T06:13:40Z

@@ -0,0 +1,47 @@
+//-------------------------------------------------------------------------------------------------------------------------------------------------------------


Everything under inc/Metal are cpp headers provided by Apple. In README.md I added instruction on how to download these from Apple website so that people can upgrade these later when Apple releases a new version.

weng271190436 · 2026-04-26T06:29:15Z

Hi @pgleeson, thank you for testing.

I realized that OpenCL worked for me locally because my Macbook has GPU but the Github Action VM doesn't have GPU. And OpenCL on ARM Mac can find GPU device but not CPU device.

As you have pointed out, for ARM Mac CI to work, I have to port GPU kernels to Metal. I have been doing that in my personal fork anyway so I add those changes into this PR to make CI work. Tested in my personal fork that build-macos works https://github.com/weng271190436/sibernetic/actions/runs/24949167863/job/73056091793 (let me know if you don't have permission to access my CI result).

However, this makes PR big, especially that I am pulling in the Metal cpp library from Apple. Let me know if you want to discuss how to break this Metal port into multiple smaller PRs to make review easier. But if you want to merge something that works end to end in one go, we can keep this PR.

Mirrors the build-macos job from PR #222 (Wei Weng's Metal port against ow-0.9.9), adapted for this branch: - Drives the build through ./setup.sh instead of bare `make -f makefile.OSX`. Our binary embeds Python and imports torch/taichi/numpy at runtime, so the venv must exist before any backend=taichi* or backend=torch run. Also doubles as a meta-test that setup.sh works on a clean machine — exactly the silent-failure mode we hit when Homebrew rolled `python` forward to 3.14. - Runs an explicit backend=taichi-cpu smoke test (deterministic, fast, proves the embedded interpreter can find taichi in the venv). - Adds a continue-on-error backend=taichi (Metal) probe so we get evidence of whether Taichi-Metal works on macos-latest without blocking the gate while we find out. - Adds workflow_dispatch: trigger to allow manual re-runs from the Actions UI while iterating. Out of scope: Linux numpy include path bug (separate PR), the legacy Intel workflow's quarantine, and reference-log/parity tests on macOS (Phase 3 of the broader test-harness plan). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rk Python CI surfaced this on macos-latest after the previous numpy fix unblocked compilation: linking died with `ld: framework 'Python' not found` even though the include path under Python.framework/Versions/3.13/include clearly resolved during the compile phase. Root cause: Homebrew's python@3.13 ships the framework headers and the versioned dylib under Python.framework/Versions/3.13/, but deliberately omits the top-level Python.framework/Python symlink that `-framework Python` looks up at link time (so it doesn't conflict with other Python framework installs). PR #222's macOS job only worked because it didn't override PYTHONFRAMEWORKDIR — it picked up the runner's pre-installed Python which has the symlink. Pinning to Homebrew's Python (which we need for venv + ABI consistency with torch/taichi) puts us in the broken-symlink case. - makefile.OSX: parameterize the Python linker flags as PYTHON_LIB_FLAGS with `?=` so it can be overridden from the env, defaulting to the existing `-framework Python` for callers who already had a working framework. - setup.sh: derive PYTHON_LIB_FLAGS from `<BUILD_PY>-config --embed --ldflags` (e.g. `-L<libdir> -lpython3.13 -ldl -framework CoreFoundation`), which links against libpython3.13.dylib by name and doesn't depend on the framework symlink. Verified locally: build links cleanly with the new flags and the binary runs `backend=taichi-cpu` on demo1 successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two empirical findings from today that update the open questions in the previous DEVELOPMENT_LOG entry: 1. dt audit on PR #222 native Metal: the lever is essentially closed with default PCISPH max_iteration=3. Sweep: 2e-5 (baseline) ✅ 75 sec for 1-sec sim, retention 118% 3e-5 (1.5×) ❌ hangs at step ~7 (PCISPH didn't converge) 4e-5 (2×) ❌ hangs 5e-5 (2.5×) ❌ hangs To open dt up, the realistic options are bumping PCISPH iterations (cheapest, code change), switching to a different pressure solver like IISPH/DFSPH/WCSPH (substantial), or testing a smaller scenario. The <3-min target on a 5-sec sim is NOT reachable via dt alone given current solver settings. 2. Taichi-CUDA pancake check at 1-sec sim: the bug is algorithmic, not Apple-Silicon-specific. On L4 CUDA, Taichi's cube didn't move at all (mean_y unchanged from initial 44.42, vs OpenCL/PR222 both reaching ~10 in same sim time). Same root-cause "forces too weak" as the Apple Silicon Taichi-Metal pancake, just at a different magnitude. Fixing taichi_solver.py is a single change that benefits both Metal and CUDA simultaneously — the README's documented 3-step coordinate-scale fix is the starting hypothesis. Both findings update the "Concrete next steps" checklist; dt audit and Taichi-CUDA verification are now done. Native CUDA backend and the cross-backend cube-stability regression remain open.

Lays out the structure for a native CUDA backend that mirrors PR #222's native Metal port. The actual kernel ports (translating sphFluid.cl's 1515 lines to sphFluid.cu CUDA C++) are deferred — they're ~2 weeks of focused work and depend on PR #222's owSolver abstract base landing first to avoid a refactor. What this commit provides: src/cuda/README.md Implementation plan (5 phases, ~2 weeks estimated, with file-by-file mapping to PR #222's Metal port for structural cribbing) src/cuda/sphFluid.cu Skeleton with __global__ kernel signatures (TODO bodies). One per OpenCL kernel in sphFluid.cl. Includes an explicit reminder that pcisphComputeElasticForces must keep elastic forces in WORLD coordinates (this is where the Taichi pancake bug originates; CUDA port should match OpenCL exactly). inc/owCudaSolver.h Public C++ interface mirroring owOpenCLSolver.h. Method signatures commented out so this header doesn't try to declare functions whose definitions don't exist yet. What this commit does NOT do: - Touch makefile, makefile.OSX — they don't reference these files, existing builds are unaffected. - Add backend=cuda to owConfigProperty — would require a real implementation to dispatch to. - Attempt actual CUDA kernel implementation — that's Phase 1 of the work plan in src/cuda/README.md. Reasoning: this gives the next developer (or a future LLM session) a clear target architecture without committing to a half-baked port. When PR #222 lands, the CUDA work can proceed against the same owSolver abstract base + src/kernels/ descriptor pattern that PR #222 introduces, rather than being structurally divergent from the Metal half.

Updates the next-steps checklist to reflect this session's progress: both the cross-backend regression script and the CUDA backend scaffold have landed. CUDA kernel implementation remains a separate ~2-week task, gated on PR #222 merging so we build against its owSolver abstract base and src/kernels/ descriptor pattern rather than diverging.

Make Sibernetic build work for modern Mac

6848086

weng271190436 commented Apr 11, 2026

View reviewed changes

weng271190436 commented Apr 12, 2026

View reviewed changes

Comment thread inc/OpenCL/cl.hpp

weng271190436 commented Apr 12, 2026

View reviewed changes

weng271190436 marked this pull request as ready for review April 12, 2026 02:00

Address warning -framework OpenCL linker input unused

d2a221e

weng271190436 commented Apr 13, 2026

View reviewed changes

weng271190436 mentioned this pull request Apr 13, 2026

Can make ./Release/Sibernetic -f worm run 20% by turning QUEUE_EACH_KERNEL to 0 #224

Open

weng271190436 force-pushed the weiweng/modernize-makefile-osx branch 2 times, most recently from 848f9e7 to 7f5c74b Compare April 25, 2026 23:47

Include metal cpp headers

6ab1f16

weng271190436 force-pushed the weiweng/modernize-makefile-osx branch from c9d2b00 to 12f6f6c Compare April 26, 2026 00:57

weng271190436 commented Apr 26, 2026

View reviewed changes

Comment thread inc/owMetalSolver.h

@@ -0,0 +1,107 @@

#pragma once

Copy link
Copy Markdown

Author

weng271190436 Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a header similar to owOpenCLSolver.h

weng271190436 commented Apr 26, 2026

View reviewed changes

Use Metal GPU library for MacOS

8234fdb

weng271190436 force-pushed the weiweng/modernize-makefile-osx branch from 12f6f6c to 8234fdb Compare April 26, 2026 05:27

weng271190436 commented Apr 26, 2026

View reviewed changes

weng271190436 changed the title ~~feat: Make Sibernetic build work for ARM Mac and update README~~ feat: Make Sibernetic build work for ARM Mac by porting GPU kernels to Metal Apr 26, 2026


		CPP_DEPS = $(OBJECTS:.o=.d)

		# Change these to set the different python directories

		@@ -0,0 +1,36 @@
		#pragma once

		// Kernel argument abstraction for the `clearBuffers` kernel.


		// ============ Metal helpers ============

		inline void bindBuffer(MTL::ComputeCommandEncoder enc, MTL::Buffer buf,

		@@ -0,0 +1,47 @@
		//-------------------------------------------------------------------------------------------------------------------------------------------------------------

Conversation

weng271190436 commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weng271190436 commented Apr 22, 2026

Uh oh!

pgleeson commented Apr 23, 2026

Uh oh!

weng271190436 commented Apr 24, 2026

Uh oh!

pgleeson commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weng271190436 Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weng271190436 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weng271190436 commented Apr 11, 2026 •

edited

Loading

weng271190436 Apr 26, 2026 •

edited

Loading