IgnisCore is an experimental local LLM inference engine written in C#/.NET and Vulkan Compute. It focuses on running Gemma 4 GGUF models on Windows with a fully local GPU pipeline: model loading, tokenization, prefill/decode, FlashAttention, Cooperative Matrix acceleration, and TurboQuant KV-cache compression experiments.
Status: active research and engineering prototype. APIs, kernels, and model compatibility can change quickly.
- C# / .NET 10 implementation with NativeAOT-friendly project settings.
- Vulkan Compute backend through Silk.NET Vulkan.
- Gemma 4 GGUF loading with Q8_0-oriented optimized paths.
- FlashAttention and NVIDIA Cooperative Matrix 2 prefill paths.
- TurboQuant KV-cache compression experiments for long-context VRAM efficiency.
- Interactive chat, single-prompt mode, benchmark mode, and system-prompt support.
- 8GB-friendly Gemma 4 E2B Q8_0 launcher and 12GB-oriented Gemma 4 E4B Q8_0 launcher.
- Windows.
- .NET 10 SDK.
- Vulkan 1.3-capable GPU and driver.
- Vulkan SDK is recommended for shader development.
- Hugging Face access for gated Gemma model metadata/weights when downloading models.
Optional local Hugging Face token:
# .env
HF_TOKEN=hf_your_token_hereThe .env file is intentionally ignored by Git.
Clone and build:
git clone https://github.com/dimohy/IgnisCore.git
cd IgnisCore
dotnet build .\src\IgnisCore.csproj -c ReleaseRun the 8GB-friendly model launcher:
.\run-chat-gemma4-e2b-it-q8-8g.ps1Run the larger 12GB-oriented model launcher:
.\run-chat-gemma4-e4b-it-q8-12g.ps1Both launchers forward extra arguments to IgnisCore, so you can override settings:
.\run-chat-gemma4-e2b-it-q8-8g.ps1 --prompt "Who are you?" --max-tokens 64
.\run-chat-gemma4-e2b-it-q8-8g.ps1 --max-seq-len 4096Downloaded models are stored under models/, which is ignored by Git.
Show help:
dotnet run -c Release --project .\src\IgnisCore.csproj -- --helpDownload/verify a known model without running inference:
dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e2b-it --gguf-type q8_0 --download-onlyRun a single prompt:
dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e2b-it --gguf-type q8_0 --prompt "Introduce IgnisCore" --max-tokens 128Run a synthetic benchmark:
dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e4b-it --gguf-type q8_0 --benchmark --bench-pp 512 --bench-tg 64| Alias | Weight repository | Metadata repository | Default GGUF | Suggested GPU |
|---|---|---|---|---|
gemma-4-e2b-it |
unsloth/gemma-4-E2B-it-GGUF |
google/gemma-4-E2B-it |
q8_0 |
8GB+ |
gemma-4-e4b-it |
unsloth/gemma-4-E4B-it-GGUF |
google/gemma-4-e4b-it |
q8_0 |
12GB+ |
| Path | Purpose |
|---|---|
src/ |
IgnisCore C# project |
src/Engine/ |
Transformer, chat, sampling, and vision pipeline orchestration |
src/Gpu/ |
Vulkan context, buffer management, and tensor operations |
src/Model/ |
GGUF/SafeTensors/tokenizer/config/model download support |
src/Shaders/ |
GLSL compute shaders and embedded SPIR-V artifacts |
src/TurboQuant/ |
TurboQuant KV-cache compression components |
run-chat-gemma4-e2b-it-q8-8g.ps1 |
8GB-friendly Gemma 4 E2B Q8_0 chat launcher |
run-chat-gemma4-e4b-it-q8-12g.ps1 |
Gemma 4 E4B Q8_0 chat launcher for larger GPUs |
- IgnisCore is optimized around GGUF Q8_0 paths today. Other quantization names may exist upstream but are not necessarily supported by the current kernels.
- Cooperative Matrix paths require compatible NVIDIA Vulkan driver/device support. Use
--no-coopmatwhen diagnosing portability issues. - Model files are large and are not committed to this repository.
Apache-2.0. See LICENSE.