Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights#18
Open
ajroetker wants to merge 5 commits intogomlx:mainfrom
Open
Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights#18ajroetker wants to merge 5 commits intogomlx:mainfrom
ajroetker wants to merge 5 commits intogomlx:mainfrom
Conversation
- Add Model.GetTensorRaw() to load raw tensor bytes without dequantization, enabling quantized weights to stay in native format - Add TensorType.ToGGMLQuantType() mapping GGUF tensor types to GoMLX GGMLQuantType (Q4_0, Q8_0, IQ4_NL, Q2_K–Q6_K) - Update go.mod dependencies Depends on: ajroetker/gomlx#6
- Multi-file GGUF: support multimodal models (e.g. LLaVA) that split tensors across multiple GGUF files via LoadAll/LoadFiles/ListGGUFFiles. Tensors are looked up across all files transparently. - Pair extra files with their readers in a single extraEntry struct to prevent parallel-slice divergence and enable per-file lazy reader init. - Fix Close() to close all readers instead of leaking on first error. - Deduplicate .gguf filename filtering into a shared ggufFileNames helper. - Add FlexToken type to handle HuggingFace token config fields that can be either a plain string or an object with a "content" field. - Update go-xla, golang.org/x/sys, k8s.io/klog dependencies.
janpfeifer
requested changes
Mar 13, 2026
…p FlexToken - Rename GetTensor→ReadTensor, GetTensorRaw→ReadTensorBytes per reviewer convention (Read prefix for I/O methods) - Merge findTensorFile+fileForIndex into findTensor to avoid double map lookup in GetTensorInfo and readerForTensor - Return error instead of silently swallowing invalid JSON in FlexToken.UnmarshalJSON - Remove unused FlexToken.String() method - Replace 7 copy-pasted config fallback blocks in resolveSpecialTokens with table-driven loop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Model.GetTensorRaw()to load raw tensor bytes without dequantization, keeping quantized weights in their native GGML block formatTensorType.ToGGMLQuantType()mapping GGUF tensor types to GoMLXGGMLQuantType(Q4_0, Q8_0, IQ4_NL, Q2_K–Q6_K)Dependencies
Depends on ajroetker/gomlx#6 — the
go.modreplace directive will be updated to a proper module version once that PR merges.