Open
Conversation
- Workspace root: add hip-rwkv workspace dep (path ../../hip-rwkv), update web-rwkv to 0.10.19, add [patch.crates-io] web-rwkv path to ensure single source of truth - ai00-core: add hip feature gating dep:hip-rwkv (optional) - ai00-server: forward hip feature to ai00-core/hip
Add Backend enum (WebGpu/Hip) to reload.rs with Default impl defaulting to WebGpu. Add backend field with #[serde(default)] to Model struct and ReloadRequest, wire through TryFrom<Config>, and add commented-out example to Config.toml. Existing config files without a backend field continue to work via serde default.
Implements web_rwkv::runtime::model::State for the HIP backend, bridging ai00's TensorCpu<f32> state format with hip-rwkv's native HipState format. Includes state format conversion between the v7 WebGPU layout [n_embd, head_size+2, n_layer, 1] and HipState's per-layer PinnedBuffer components with f32<->f16 conversion for shift states. The att/ffn/write/read methods return errors since HIP doesn't support per-layer GPU state manipulation.
Wire up the HIP backend loading in ai00-core: - Add load_runtime_hip() (cfg-gated) that validates V7 model, loads via Rwkv7Hip::load on spawn_blocking, creates HipRuntime and HipStateAdapter - Add Backend dispatch in ThreadRequest::Reload handler: WebGpu takes the existing path, Hip calls load_runtime_hip() - Introduce SoftmaxBackend enum in run.rs to decouple softmax computation from wgpu Context (WebGpu variant uses wgpu, Hip variant uses hip_rwkv::softmax_hip_batch) - Change run() to accept SoftmaxBackend instead of Context - Make CoreRuntime.context optional (None for HIP backend) - Add HipModelStub for ModelSerialize (HIP doesn't support save) - Fix pre-existing enumerate_adapters async API mismatch
- Extend list_adapters() with HIP device enumeration via hip_rwkv::hip::get_device_count/get_device_name behind #[cfg(feature = "hip")] - Add pub hip_to_model_info() converting Rwkv7ModelInfo + LoraDims to ModelInfo with ModelCustomInfo::V7 populated from LoRA dimensions - Make Environment::Loaded.model Option<Arc<dyn ModelSerialize>> so the HIP backend (which cannot serialize) passes None instead of a stub - Update Save handler to gracefully return false when model is None - Remove HipModelStub (no longer needed) - Update load_runtime_hip return type to exclude model component
- Panic at startup if config requests backend = "Hip" but the binary was compiled without --features hip, instead of silently failing in a fire-and-forget background task. - Log progress through the reload path (env lock, tokenizer, backend dispatch, model load) so hangs are diagnosable. - Monitor the fire-and-forget initial load task and log errors/panics instead of silently dropping them. - Use per-batch state methods (load_state_batch/get_state_batch) in HipStateAdapter so save/restore targets a single slot.
- Relax fastembed version constraint from =4.4.0 to 4 to fix ort compilation errors (ort v2.0.0-rc.9 API incompatibilities) - Add BGELargeZHV15 and ModernBertEmbedLarge variants to EmbeddingModel enum to match fastembed 4.9.1 - Apply cargo fmt formatting fixes
- Add tests/smoke.rs with ignored smoke tests that spawn the server as a subprocess and verify completion responses - Add assets/models symlink to /workspace/models for test models - Update configs to use assets/models path and consistent model name - Add reqwest dev-dependency for HTTP client in tests Run with: cargo test --features hip smoke_hip -- --ignored
Author
|
AI summary of the changes: Summary
Changes
Test Plan# Build with HIP
cargo build --release --features hip
# Run smoke tests (requires GPU + model)
cargo test --features hip smoke_hip -- --ignored
cargo test smoke_webgpu -- --ignoredNotes
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey,
Some initial work to support hip backend. I am planning on creating a new repo
hip-rwkvthat will actually implement the model code. It's heavily based on yourweb-rwkvthanks for the kick off point!The code for hip backend currently lives here - Nintorac/web-rwkv#1
Just posting this here as a preliminary idea, is this something you're interested in merging? or prefer I just fork and maintain myself?