Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
0f48675
Update config
eSlider Jan 11, 2026
2680ff6
Merge branch 'main' of github.com:eleiton/ollama-intel-arc
eSlider Jan 11, 2026
8f8c402
feat: add custom IPEX-LLM Ollama Dockerfile and tune Intel GPU config
eSlider Feb 16, 2026
53fe333
feat: make Intel GPU config env-var driven and restore defaults
eSlider Feb 16, 2026
e01d7e3
feat: BuildKit cache mounts, Dockerfile cleanup, and VRAM/context docs
eSlider Feb 16, 2026
48a31bf
fix: restore open-webui service and fix SYCL_CACHE_PERSISTENT typo
eSlider Feb 16, 2026
c57d59e
docs: expand VRAM guide with build/config sections and update README …
eSlider Feb 16, 2026
f1b336d
feat: add shm_size 16G for SYCL/Level Zero shared memory
eSlider Feb 16, 2026
e0ce029
docs: add SYCL vs Vulkan comparison and troubleshooting
eSlider Feb 16, 2026
a1823e5
docs: detail SYCL source build with patch-sycl.py and upgrade steps
eSlider Feb 16, 2026
29c5e82
docs: add concrete Dockerfile examples for SYCL source build
eSlider Feb 16, 2026
8bc14d9
docs: add SYCL source build Dockerfile and patch-sycl.py links to README
eSlider Feb 16, 2026
901d8d6
feat: add SYCL source build files and project structure to README
eSlider Feb 16, 2026
7d6f3af
refactor: rename tmp/ to ollama-sycl/ and align with project structure
eSlider Feb 16, 2026
c8c2a02
rename: ollama-sycl → sycl-ollama
eSlider Feb 16, 2026
521e667
feat: add no_proxy to both compose files
eSlider Feb 16, 2026
ac0c1e0
feat: upgrade sycl-ollama to Ollama v0.16.1
eSlider Feb 16, 2026
0f7fce1
docs: fix broken Whisper link, update patch-sycl.py descriptions, add…
eSlider Feb 16, 2026
8a4870d
docs: add image size comparison, update sizes from actual builds
eSlider Feb 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 64 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,22 @@ All these containers have been optimized for Intel Arc Series GPUs on Linux syst

![screenshot](resources/open-webui.png)

## Tested Hardware

| Intel GPU | Status |
|---|---|
| Core Ultra 7 155H integrated Arc (Meteor Lake) | Verified |
| Arc A-series (A770, A750, A380) | Expected compatible |
| Data Center Flex / Max | Expected compatible |

## Documentation

* **[SYCL vs Vulkan — GPU Backend Comparison](docs/sycl-vs-vulkan.md)** — performance benchmarks (SYCL is 40–100% faster), three backend options (IPEX-LLM bundle, SYCL from source, upstream Vulkan), how `patch-sycl.py` works, and troubleshooting.
* **[Intel Arc A770 Context Length & VRAM Guide](docs/intel-arc-a770-context-limits.md)** — how to choose context length, KV cache quantization, and model size for 16 GB Intel Arc GPUs. Includes VRAM budget tables, per-model recommendations, and environment variable reference.
* **[Custom IPEX-LLM Dockerfile](ipex-ollama/Dockerfile)** — build your own Ollama image from scratch with pinned Intel GPU runtimes (Level Zero, IGC, compute-runtime) and the IPEX-LLM portable bundle. Uses BuildKit cache mounts for fast rebuilds.
* **[SYCL Source Build Dockerfile](sycl-ollama/Dockerfile)** — multi-stage build that compiles `ggml-sycl` from source with Intel oneAPI, paired with the official Ollama v0.16.1 binary. Includes [`patch-sycl.py`](sycl-ollama/patch-sycl.py) for backward-compatible API patching (no patches needed as of v0.16.1).
* **[docker-compose.yml](docker-compose.yml)** — fully documented Compose file with env-var driven configuration. All Intel GPU tuning knobs (SYCL, XeTLA, SDP fusion, KV cache, flash attention) are configurable via `${VAR:-default}` syntax and a `.env` file.

## Services
1. Ollama
* Runs llama.cpp and Ollama with IPEX-LLM on your Linux computer with Intel Arc GPU.
Expand All @@ -40,7 +56,7 @@ All these containers have been optimized for Intel Arc Series GPUs on Linux syst

5. OpenAI Whisper
* Robust Speech Recognition via Large-Scale Weak Supervision
* Uses as the base container the official [Intel® Extension for PyTorch](* Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
* Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)

## Setup
Run the following commands to start your Ollama instance with Open WebUI
Expand All @@ -50,6 +66,11 @@ $ cd ollama-intel-arc
$ podman compose up
```

Alternatively, to use the **SYCL-from-source** build (newer Ollama, faster inference — see [SYCL vs Vulkan](docs/sycl-vs-vulkan.md)):
```bash
$ podman compose -f docker-compose.sycl-ollama.yml up --build
```

Additionally, if you want to run one or more of the image generation tools, run these command in a different terminal:

For ComfyUI
Expand Down Expand Up @@ -83,6 +104,12 @@ When using Open WebUI, you should see this partial output in your console, indic
[ollama-intel-arc] | | 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 62400M| 1.6.32224+14|
```

For the **SYCL-from-source** build (`docker-compose.sycl-ollama.yml`), you should see:
```bash
[sycl-ollama] | Listening on [::]:11434 (version 0.16.1)
[sycl-ollama] | inference compute id="" library="" name=SYCL0 description="Intel(R) Arc(TM) Graphics" type=discrete total="28.0 GiB"
```

## Using Image Generation
* Open your web browser to http://localhost:7860 to access the SD.Next web page.
* For the purposes of this demonstration, we'll use the [DreamShaper](https://civitai.com/models/4384/dreamshaper) model.
Expand Down Expand Up @@ -168,6 +195,42 @@ $ podman exec -it ollama-intel-arc /bin/bash
$ /llm/ollama/ollama -v
```

## Project Structure

```
.
├── docker-compose.yml # Main stack: Ollama (IPEX-LLM) + Open WebUI
├── docker-compose.sycl-ollama.yml # SYCL-from-source Ollama + Open WebUI (alternative)
├── docker-compose.comfyui.yml # ComfyUI image generation
├── docker-compose.sdnext.yml # SD.Next image generation
├── docker-compose.whisper.yml # OpenAI Whisper speech recognition
├── docker-compose.ramalama.yml # RamaLama support
├── ipex-ollama/
│ └── Dockerfile # IPEX-LLM bundle build (Ollama v0.9.3, SYCL)
├── sycl-ollama/ # SYCL-from-source build (Ollama v0.16.1)
│ ├── Dockerfile # Multi-stage: oneAPI build → minimal runtime
│ ├── patch-sycl.py # API compat patches (no-op since v0.16.1)
│ ├── start-ollama.sh # Legacy entrypoint (from IPEX-LLM era)
│ └── test-glm-ocr.sh # Vision model test script (glm-ocr)
├── comfyui/
│ └── Dockerfile # ComfyUI with Intel Extension for PyTorch
├── sdnext/
│ └── Dockerfile # SD.Next with Intel Extension for PyTorch
├── whisper/
│ └── Dockerfile # OpenAI Whisper with Intel Extension for PyTorch
├── ramalama/
│ └── Dockerfile # RamaLama container
├── docs/
│ ├── sycl-vs-vulkan.md # SYCL vs Vulkan backend comparison
│ └── intel-arc-a770-context-limits.md # VRAM & context length guide
└── resources/ # Screenshots for README
```

## My development environment:
* Core Ultra 7 155H
* Intel® Arc™ Graphics (Meteor Lake-P)
Expand Down
100 changes: 100 additions & 0 deletions docker-compose.sycl-ollama.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
services:
sycl-ollama:
build:
context: sycl-ollama
dockerfile: Dockerfile
args:
OLLAMA_VERSION: "0.16.1"
image: sycl-ollama:local
container_name: sycl-ollama
restart: unless-stopped

shm_size: "16G" # Shared memory limit (/dev/shm). Docker defaults to 64 MB which is too small
# for SYCL kernel caches, Level Zero buffers, and memory-mapped model loading.

devices:
- /dev/dri:/dev/dri # Required: maps Intel GPU render & card nodes for SYCL / Level Zero access

volumes:
- ollama-volume:/root/.ollama # Persistent storage for downloaded models (shared with main stack)

ports:
- 11434:11434 # Exposes Ollama API (default port)

environment:
# ───────────────────────────────────────────────────────────────
# Proxy bypass — prevents corporate/system HTTP proxies from
# intercepting container-to-container and localhost traffic.
# Without this, model downloads may work but inter-service
# calls (Open WebUI → Ollama) can silently fail or time out.
# ───────────────────────────────────────────────────────────────
- no_proxy=${no_proxy:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}
- NO_PROXY=${NO_PROXY:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}

# ───────────────────────────────────────────────────────────────
# Ollama server & runtime behavior
# ───────────────────────────────────────────────────────────────
- OLLAMA_HOST=0.0.0.0
- OLLAMA_NUM_PARALLEL=${OLLAMA_NUM_PARALLEL:-1}
- OLLAMA_DEFAULT_KEEPALIVE=${OLLAMA_DEFAULT_KEEPALIVE:-6h}
- OLLAMA_KEEP_ALIVE=${OLLAMA_KEEP_ALIVE:-24h}
- OLLAMA_MAX_LOADED_MODELS=${OLLAMA_MAX_LOADED_MODELS:-1}
- OLLAMA_MAX_QUEUE=${OLLAMA_MAX_QUEUE:-512}
- OLLAMA_MAX_VRAM=${OLLAMA_MAX_VRAM:-0}
- OLLAMA_DEBUG=${OLLAMA_DEBUG:-1}

# ───────────────────────────────────────────────────────────────
# Context length & KV cache quantization
# ───────────────────────────────────────────────────────────────
- OLLAMA_CONTEXT_LENGTH=${OLLAMA_CONTEXT_LENGTH:-16384}
- OLLAMA_KV_CACHE_TYPE=${OLLAMA_KV_CACHE_TYPE:-q4_0}
- OLLAMA_FLASH_ATTENTION=${OLLAMA_FLASH_ATTENTION:-1}

# ───────────────────────────────────────────────────────────────
# Intel SYCL / Level Zero GPU tuning
# ───────────────────────────────────────────────────────────────
- ONEAPI_DEVICE_SELECTOR=${ONEAPI_DEVICE_SELECTOR:-level_zero:0}
- ZES_ENABLE_SYSMAN=${ZES_ENABLE_SYSMAN:-1}
- SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=${SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS:-1}
- SYCL_CACHE_PERSISTENT=${SYCL_CACHE_PERSISTENT:-1}
- ENABLE_SDP_FUSION=${ENABLE_SDP_FUSION:-1}

# ───────────────────────────────────────────────────────────────
# GPU layer offloading
# ───────────────────────────────────────────────────────────────
- OLLAMA_NUM_GPU=${OLLAMA_NUM_GPU:-999}

open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui-sycl
volumes:
- open-webui-volume:/app/backend/data
depends_on:
- sycl-ollama
ports:
- ${OLLAMA_WEBUI_PORT:-4040}:8080
environment:
- OLLAMA_BASE_URL=http://sycl-ollama:11434

# Proxy bypass (see Ollama service for explanation)
- no_proxy=${no_proxy:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}
- NO_PROXY=${NO_PROXY:-localhost,127.0.0.1,sycl-ollama,open-webui-sycl}

- WEBUI_AUTH=False
- ENABLE_OPENAI_API=False
- ENABLE_OLLAMA_API=True

# Web search for RAG
- ENABLE_RAG_WEB_SEARCH=True

# Telemetry opt-out
- SCARF_NO_ANALYTICS=true
- DO_NOT_TRACK=true
- ANONYMIZED_TELEMETRY=false
extra_hosts:
- host.docker.internal:host-gateway
restart: unless-stopped

volumes:
ollama-volume: {}
open-webui-volume: {}
Loading