eleiton · eSlider · Jan 11, 2026 · Jan 11, 2026 · Feb 16, 2026 · Feb 16, 2026
diff --git a/README.md b/README.md
@@ -14,6 +14,22 @@ All these containers have been optimized for Intel Arc Series GPUs on Linux syst
 
 ![screenshot](resources/open-webui.png)
 
+## Tested Hardware
+
+| Intel GPU | Status |
+|---|---|
+| Core Ultra 7 155H integrated Arc (Meteor Lake) | Verified |
+| Arc A-series (A770, A750, A380) | Expected compatible |
+| Data Center Flex / Max | Expected compatible |
+
+## Documentation
+
+* **[SYCL vs Vulkan — GPU Backend Comparison](docs/sycl-vs-vulkan.md)** — performance benchmarks (SYCL is 40–100% faster), three backend options (IPEX-LLM bundle, SYCL from source, upstream Vulkan), how `patch-sycl.py` works, and troubleshooting.
+* **[Intel Arc A770 Context Length & VRAM Guide](docs/intel-arc-a770-context-limits.md)** — how to choose context length, KV cache quantization, and model size for 16 GB Intel Arc GPUs. Includes VRAM budget tables, per-model recommendations, and environment variable reference.
+* **[Custom IPEX-LLM Dockerfile](ipex-ollama/Dockerfile)** — build your own Ollama image from scratch with pinned Intel GPU runtimes (Level Zero, IGC, compute-runtime) and the IPEX-LLM portable bundle. Uses BuildKit cache mounts for fast rebuilds.
+* **[SYCL Source Build Dockerfile](sycl-ollama/Dockerfile)** — multi-stage build that compiles `ggml-sycl` from source with Intel oneAPI, paired with the official Ollama v0.16.1 binary. Includes [`patch-sycl.py`](sycl-ollama/patch-sycl.py) for backward-compatible API patching (no patches needed as of v0.16.1).
+* **[docker-compose.yml](docker-compose.yml)** — fully documented Compose file with env-var driven configuration. All Intel GPU tuning knobs (SYCL, XeTLA, SDP fusion, KV cache, flash attention) are configurable via `${VAR:-default}` syntax and a `.env` file.
+
 ## Services
 1. Ollama  
    * Runs llama.cpp and Ollama with IPEX-LLM on your Linux computer with Intel Arc GPU.  
@@ -40,7 +56,7 @@ All these containers have been optimized for Intel Arc Series GPUs on Linux syst
 
 5. OpenAI Whisper
    * Robust Speech Recognition via Large-Scale Weak Supervision
-   * Uses as the base container the official [Intel® Extension for PyTorch](* Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
+   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
 
 ## Setup
 Run the following commands to start your Ollama instance with Open WebUI
@@ -50,6 +66,11 @@ $ cd ollama-intel-arc
 $ podman compose up
 ```
 
+Alternatively, to use the **SYCL-from-source** build (newer Ollama, faster inference — see [SYCL vs Vulkan](docs/sycl-vs-vulkan.md)):
+```bash
+$ podman compose -f docker-compose.sycl-ollama.yml up --build
+```
+
 Additionally, if you want to run one or more of the image generation tools, run these command in a different terminal:
 
 For ComfyUI
@@ -83,6 +104,12 @@ When using Open WebUI, you should see this partial output in your console, indic
 [ollama-intel-arc] | | 0| [level_zero:gpu:0]|                     Intel Arc Graphics|  12.71|    128|    1024|   32| 62400M|         1.6.32224+14|
 ```
 
+For the **SYCL-from-source** build (`docker-compose.sycl-ollama.yml`), you should see:
+```bash
+[sycl-ollama] | Listening on [::]:11434 (version 0.16.1)
+[sycl-ollama] | inference compute  id="" library="" name=SYCL0 description="Intel(R) Arc(TM) Graphics" type=discrete total="28.0 GiB"
+```
+
 ## Using Image Generation
 * Open your web browser to http://localhost:7860 to access the SD.Next web page.
 * For the purposes of this demonstration, we'll use the [DreamShaper](https://civitai.com/models/4384/dreamshaper) model.
@@ -168,6 +195,42 @@ $ podman exec -it ollama-intel-arc /bin/bash
 $ /llm/ollama/ollama -v
 ```
 
+## Project Structure
+
+```
+.
+├── docker-compose.yml                # Main stack: Ollama (IPEX-LLM) + Open WebUI
+├── docker-compose.sycl-ollama.yml    # SYCL-from-source Ollama + Open WebUI (alternative)
+├── docker-compose.comfyui.yml        # ComfyUI image generation
+├── docker-compose.sdnext.yml         # SD.Next image generation
+├── docker-compose.whisper.yml        # OpenAI Whisper speech recognition
+├── docker-compose.ramalama.yml       # RamaLama support
+│
+├── ipex-ollama/
+│   └── Dockerfile                    # IPEX-LLM bundle build (Ollama v0.9.3, SYCL)
+│
+├── sycl-ollama/                      # SYCL-from-source build (Ollama v0.16.1)
+│   ├── Dockerfile                    # Multi-stage: oneAPI build → minimal runtime
+│   ├── patch-sycl.py                 # API compat patches (no-op since v0.16.1)
+│   ├── start-ollama.sh               # Legacy entrypoint (from IPEX-LLM era)
+│   └── test-glm-ocr.sh              # Vision model test script (glm-ocr)
+│
+├── comfyui/
+│   └── Dockerfile                    # ComfyUI with Intel Extension for PyTorch
+├── sdnext/
+│   └── Dockerfile                    # SD.Next with Intel Extension for PyTorch
+├── whisper/
+│   └── Dockerfile                    # OpenAI Whisper with Intel Extension for PyTorch
+├── ramalama/
+│   └── Dockerfile                    # RamaLama container
+│
+├── docs/
+│   ├── sycl-vs-vulkan.md             # SYCL vs Vulkan backend comparison
+│   └── intel-arc-a770-context-limits.md  # VRAM & context length guide
+│
+└── resources/                        # Screenshots for README
+```
+
 ## My development environment:
 * Core Ultra 7 155H
 * Intel® Arc™ Graphics (Meteor Lake-P)

diff --git a/docker-compose.sycl-ollama.yml b/docker-compose.sycl-ollama.yml
@@ -0,0 +1,100 @@
+services:
+  sycl-ollama:
+    build:
+      context: sycl-ollama
+      dockerfile: Dockerfile
+      args:
+        OLLAMA_VERSION: "0.16.1"
+    image: sycl-ollama:local
+    container_name: sycl-ollama
+    restart: unless-stopped
+
+    shm_size: "16G"                        # Shared memory limit (/dev/shm). Docker defaults to 64 MB which is too small
+                                           # for SYCL kernel caches, Level Zero buffers, and memory-mapped model loading.
+
+    devices:
+      - /dev/dri:/dev/dri                 # Required: maps Intel GPU render & card nodes for SYCL / Level Zero access
+
+    volumes:
+      - ollama-volume:/root/.ollama       # Persistent storage for downloaded models (shared with main stack)
+
+    ports:
+      - 11434:11434                       # Exposes Ollama API (default port)
+
+    environment:
+      # ───────────────────────────────────────────────────────────────
+      # Proxy bypass — prevents corporate/system HTTP proxies from
+      # intercepting container-to-container and localhost traffic.
+      # Without this, model downloads may work but inter-service
+      # calls (Open WebUI → Ollama) can silently fail or time out.
+      # ───────────────────────────────────────────────────────────────
+      - no_proxy=${no_proxy:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}
+      - NO_PROXY=${NO_PROXY:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}
+
+      # ───────────────────────────────────────────────────────────────
+      # Ollama server & runtime behavior
+      # ───────────────────────────────────────────────────────────────
+      - OLLAMA_HOST=0.0.0.0
+      - OLLAMA_NUM_PARALLEL=${OLLAMA_NUM_PARALLEL:-1}
+      - OLLAMA_DEFAULT_KEEPALIVE=${OLLAMA_DEFAULT_KEEPALIVE:-6h}
+      - OLLAMA_KEEP_ALIVE=${OLLAMA_KEEP_ALIVE:-24h}
+      - OLLAMA_MAX_LOADED_MODELS=${OLLAMA_MAX_LOADED_MODELS:-1}
+      - OLLAMA_MAX_QUEUE=${OLLAMA_MAX_QUEUE:-512}
+      - OLLAMA_MAX_VRAM=${OLLAMA_MAX_VRAM:-0}
+      - OLLAMA_DEBUG=${OLLAMA_DEBUG:-1}
+
+      # ───────────────────────────────────────────────────────────────
+      # Context length & KV cache quantization
+      # ───────────────────────────────────────────────────────────────
+      - OLLAMA_CONTEXT_LENGTH=${OLLAMA_CONTEXT_LENGTH:-16384}
+      - OLLAMA_KV_CACHE_TYPE=${OLLAMA_KV_CACHE_TYPE:-q4_0}
+      - OLLAMA_FLASH_ATTENTION=${OLLAMA_FLASH_ATTENTION:-1}
+
+      # ───────────────────────────────────────────────────────────────
+      # Intel SYCL / Level Zero GPU tuning
+      # ───────────────────────────────────────────────────────────────
+      - ONEAPI_DEVICE_SELECTOR=${ONEAPI_DEVICE_SELECTOR:-level_zero:0}
+      - ZES_ENABLE_SYSMAN=${ZES_ENABLE_SYSMAN:-1}
+      - SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=${SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS:-1}
+      - SYCL_CACHE_PERSISTENT=${SYCL_CACHE_PERSISTENT:-1}
+      - ENABLE_SDP_FUSION=${ENABLE_SDP_FUSION:-1}
+
+      # ───────────────────────────────────────────────────────────────
+      # GPU layer offloading
+      # ───────────────────────────────────────────────────────────────
+      - OLLAMA_NUM_GPU=${OLLAMA_NUM_GPU:-999}
+
+  open-webui:
+    image: ghcr.io/open-webui/open-webui:latest
+    container_name: open-webui-sycl
+    volumes:
+      - open-webui-volume:/app/backend/data
+    depends_on:
+      - sycl-ollama
+    ports:
+      - ${OLLAMA_WEBUI_PORT:-4040}:8080
+    environment:
+      - OLLAMA_BASE_URL=http://sycl-ollama:11434
+
+      # Proxy bypass (see Ollama service for explanation)
+      - no_proxy=${no_proxy:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}
+      - NO_PROXY=${NO_PROXY:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}
+
+      - WEBUI_AUTH=False
+      - ENABLE_OPENAI_API=False
+      - ENABLE_OLLAMA_API=True
+
+      # Web search for RAG
+      - ENABLE_RAG_WEB_SEARCH=True
+
+      # Telemetry opt-out
+      - SCARF_NO_ANALYTICS=true
+      - DO_NOT_TRACK=true
+      - ANONYMIZED_TELEMETRY=false
+    extra_hosts:
+      - host.docker.internal:host-gateway
+    restart: unless-stopped
+
+volumes:
+  ollama-volume: {}
+  open-webui-volume: {}