Skip to content

Add HIP Backend#166

Open
Nintorac wants to merge 10 commits intoAi00-X:mainfrom
Nintorac:hip
Open

Add HIP Backend#166
Nintorac wants to merge 10 commits intoAi00-X:mainfrom
Nintorac:hip

Conversation

@Nintorac
Copy link
Copy Markdown

@Nintorac Nintorac commented Feb 4, 2026

Hey,

Some initial work to support hip backend. I am planning on creating a new repo hip-rwkv that will actually implement the model code. It's heavily based on your web-rwkv thanks for the kick off point!

The code for hip backend currently lives here - Nintorac/web-rwkv#1

Just posting this here as a preliminary idea, is this something you're interested in merging? or prefer I just fork and maintain myself?

- Workspace root: add hip-rwkv workspace dep (path ../../hip-rwkv),
  update web-rwkv to 0.10.19, add [patch.crates-io] web-rwkv path
  to ensure single source of truth
- ai00-core: add hip feature gating dep:hip-rwkv (optional)
- ai00-server: forward hip feature to ai00-core/hip
Add Backend enum (WebGpu/Hip) to reload.rs with Default impl
defaulting to WebGpu. Add backend field with #[serde(default)] to
Model struct and ReloadRequest, wire through TryFrom<Config>, and
add commented-out example to Config.toml. Existing config files
without a backend field continue to work via serde default.
Implements web_rwkv::runtime::model::State for the HIP backend,
bridging ai00's TensorCpu<f32> state format with hip-rwkv's native
HipState format. Includes state format conversion between the v7
WebGPU layout [n_embd, head_size+2, n_layer, 1] and HipState's
per-layer PinnedBuffer components with f32<->f16 conversion for
shift states. The att/ffn/write/read methods return errors since
HIP doesn't support per-layer GPU state manipulation.
Wire up the HIP backend loading in ai00-core:
- Add load_runtime_hip() (cfg-gated) that validates V7 model,
  loads via Rwkv7Hip::load on spawn_blocking, creates HipRuntime
  and HipStateAdapter
- Add Backend dispatch in ThreadRequest::Reload handler: WebGpu
  takes the existing path, Hip calls load_runtime_hip()
- Introduce SoftmaxBackend enum in run.rs to decouple softmax
  computation from wgpu Context (WebGpu variant uses wgpu,
  Hip variant uses hip_rwkv::softmax_hip_batch)
- Change run() to accept SoftmaxBackend instead of Context
- Make CoreRuntime.context optional (None for HIP backend)
- Add HipModelStub for ModelSerialize (HIP doesn't support save)
- Fix pre-existing enumerate_adapters async API mismatch
- Extend list_adapters() with HIP device enumeration via
  hip_rwkv::hip::get_device_count/get_device_name behind #[cfg(feature = "hip")]
- Add pub hip_to_model_info() converting Rwkv7ModelInfo + LoraDims to ModelInfo
  with ModelCustomInfo::V7 populated from LoRA dimensions
- Make Environment::Loaded.model Option<Arc<dyn ModelSerialize>> so the HIP
  backend (which cannot serialize) passes None instead of a stub
- Update Save handler to gracefully return false when model is None
- Remove HipModelStub (no longer needed)
- Update load_runtime_hip return type to exclude model component
- Panic at startup if config requests backend = "Hip" but the binary
  was compiled without --features hip, instead of silently failing in
  a fire-and-forget background task.
- Log progress through the reload path (env lock, tokenizer, backend
  dispatch, model load) so hangs are diagnosable.
- Monitor the fire-and-forget initial load task and log errors/panics
  instead of silently dropping them.
- Use per-batch state methods (load_state_batch/get_state_batch) in
  HipStateAdapter so save/restore targets a single slot.
- Relax fastembed version constraint from =4.4.0 to 4 to fix ort
  compilation errors (ort v2.0.0-rc.9 API incompatibilities)
- Add BGELargeZHV15 and ModernBertEmbedLarge variants to EmbeddingModel
  enum to match fastembed 4.9.1
- Apply cargo fmt formatting fixes
- Add tests/smoke.rs with ignored smoke tests that spawn the server as a
  subprocess and verify completion responses
- Add assets/models symlink to /workspace/models for test models
- Update configs to use assets/models path and consistent model name
- Add reqwest dev-dependency for HTTP client in tests

Run with: cargo test --features hip smoke_hip -- --ignored
@Nintorac
Copy link
Copy Markdown
Author

Nintorac commented Feb 6, 2026

AI summary of the changes:

Summary

  • HIP backend integration: Uses hip-rwkv for RWKV
    v7 inference via rocBLAS GEMM and custom HIP kernels
  • Backend selection: New backend config option ("WebGpu" or "Hip") with WebGpu as d
    efault
  • Feature-gated: Build with --features hip to include HIP support
  • Documentation: Added HIP setup guide to README with prerequisites, build instructions,
    and sample config
  • Smoke tests: Subprocess-based integration tests for both WebGPU and HIP backends

Changes

Area Files
Core ai00-core/src/lib.rs, hip_state.rs, run.rs
Config ai00-server/src/config.rs, Config.hip.toml
Dependencies Cargo.toml (hip-rwkv from upstream)
Tests tests/smoke.rs with smoke_webgpu and smoke_hip
Docs README.md HIP section

Test Plan

# Build with HIP
cargo build --release --features hip

# Run smoke tests (requires GPU + model)
cargo test --features hip smoke_hip -- --ignored
cargo test smoke_webgpu -- --ignored

Notes

  • HIP backend currently supports RWKV v7 models only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant