Skip to content

dimohy/IgnisCore

Repository files navigation

IgnisCore

한국어 문서

IgnisCore is an experimental local LLM inference engine written in C#/.NET and Vulkan Compute. It focuses on running Gemma 4 GGUF models on Windows with a fully local GPU pipeline: model loading, tokenization, prefill/decode, FlashAttention, Cooperative Matrix acceleration, and TurboQuant KV-cache compression experiments.

Status: active research and engineering prototype. APIs, kernels, and model compatibility can change quickly.

Highlights

  • C# / .NET 10 implementation with NativeAOT-friendly project settings.
  • Vulkan Compute backend through Silk.NET Vulkan.
  • Gemma 4 GGUF loading with Q8_0-oriented optimized paths.
  • FlashAttention and NVIDIA Cooperative Matrix 2 prefill paths.
  • TurboQuant KV-cache compression experiments for long-context VRAM efficiency.
  • Interactive chat, single-prompt mode, benchmark mode, and system-prompt support.
  • 8GB-friendly Gemma 4 E2B Q8_0 launcher and 12GB-oriented Gemma 4 E4B Q8_0 launcher.

Requirements

  • Windows.
  • .NET 10 SDK.
  • Vulkan 1.3-capable GPU and driver.
  • Vulkan SDK is recommended for shader development.
  • Hugging Face access for gated Gemma model metadata/weights when downloading models.

Optional local Hugging Face token:

# .env
HF_TOKEN=hf_your_token_here

The .env file is intentionally ignored by Git.

Quick start

Clone and build:

git clone https://github.com/dimohy/IgnisCore.git
cd IgnisCore
dotnet build .\src\IgnisCore.csproj -c Release

Run the 8GB-friendly model launcher:

.\run-chat-gemma4-e2b-it-q8-8g.ps1

Run the larger 12GB-oriented model launcher:

.\run-chat-gemma4-e4b-it-q8-12g.ps1

Both launchers forward extra arguments to IgnisCore, so you can override settings:

.\run-chat-gemma4-e2b-it-q8-8g.ps1 --prompt "Who are you?" --max-tokens 64
.\run-chat-gemma4-e2b-it-q8-8g.ps1 --max-seq-len 4096

Downloaded models are stored under models/, which is ignored by Git.

CLI examples

Show help:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --help

Download/verify a known model without running inference:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e2b-it --gguf-type q8_0 --download-only

Run a single prompt:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e2b-it --gguf-type q8_0 --prompt "Introduce IgnisCore" --max-tokens 128

Run a synthetic benchmark:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e4b-it --gguf-type q8_0 --benchmark --bench-pp 512 --bench-tg 64

Known model aliases

Alias Weight repository Metadata repository Default GGUF Suggested GPU
gemma-4-e2b-it unsloth/gemma-4-E2B-it-GGUF google/gemma-4-E2B-it q8_0 8GB+
gemma-4-e4b-it unsloth/gemma-4-E4B-it-GGUF google/gemma-4-e4b-it q8_0 12GB+

Repository layout

Path Purpose
src/ IgnisCore C# project
src/Engine/ Transformer, chat, sampling, and vision pipeline orchestration
src/Gpu/ Vulkan context, buffer management, and tensor operations
src/Model/ GGUF/SafeTensors/tokenizer/config/model download support
src/Shaders/ GLSL compute shaders and embedded SPIR-V artifacts
src/TurboQuant/ TurboQuant KV-cache compression components
run-chat-gemma4-e2b-it-q8-8g.ps1 8GB-friendly Gemma 4 E2B Q8_0 chat launcher
run-chat-gemma4-e4b-it-q8-12g.ps1 Gemma 4 E4B Q8_0 chat launcher for larger GPUs

Notes

  • IgnisCore is optimized around GGUF Q8_0 paths today. Other quantization names may exist upstream but are not necessarily supported by the current kernels.
  • Cooperative Matrix paths require compatible NVIDIA Vulkan driver/device support. Use --no-coopmat when diagnosing portability issues.
  • Model files are large and are not committed to this repository.

License

Apache-2.0. See LICENSE.

About

IgnisCore — Experimental local LLM inference engine in C#/.NET with Vulkan Compute. Runs Gemma 4 GGUF models fully on Windows GPU, featuring model loading, tokenization, prefill/decode, FlashAttention, Cooperative Matrix acceleration, and TurboQuant KV-cache compression experiments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors