Skip to content

Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights#18

Open
ajroetker wants to merge 5 commits intogomlx:mainfrom
ajroetker:gguf-quantized-types
Open

Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights#18
ajroetker wants to merge 5 commits intogomlx:mainfrom
ajroetker:gguf-quantized-types

Conversation

@ajroetker
Copy link
Copy Markdown
Contributor

  • Add Model.GetTensorRaw() to load raw tensor bytes without dequantization, keeping quantized weights in their native GGML block format
  • Add TensorType.ToGGMLQuantType() mapping GGUF tensor types to GoMLX GGMLQuantType (Q4_0, Q8_0, IQ4_NL, Q2_K–Q6_K)
  • Update go.mod dependencies

Dependencies

Depends on ajroetker/gomlx#6 — the go.mod replace directive will be updated to a proper module version once that PR merges.

- Add Model.GetTensorRaw() to load raw tensor bytes without
  dequantization, enabling quantized weights to stay in native format
- Add TensorType.ToGGMLQuantType() mapping GGUF tensor types to
  GoMLX GGMLQuantType (Q4_0, Q8_0, IQ4_NL, Q2_K–Q6_K)
- Update go.mod dependencies

Depends on: ajroetker/gomlx#6
- Multi-file GGUF: support multimodal models (e.g. LLaVA) that split
  tensors across multiple GGUF files via LoadAll/LoadFiles/ListGGUFFiles.
  Tensors are looked up across all files transparently.
- Pair extra files with their readers in a single extraEntry struct to
  prevent parallel-slice divergence and enable per-file lazy reader init.
- Fix Close() to close all readers instead of leaking on first error.
- Deduplicate .gguf filename filtering into a shared ggufFileNames helper.
- Add FlexToken type to handle HuggingFace token config fields that can
  be either a plain string or an object with a "content" field.
- Update go-xla, golang.org/x/sys, k8s.io/klog dependencies.
…p FlexToken

- Rename GetTensor→ReadTensor, GetTensorRaw→ReadTensorBytes per reviewer
  convention (Read prefix for I/O methods)
- Merge findTensorFile+fileForIndex into findTensor to avoid double map
  lookup in GetTensorInfo and readerForTensor
- Return error instead of silently swallowing invalid JSON in
  FlexToken.UnmarshalJSON
- Remove unused FlexToken.String() method
- Replace 7 copy-pasted config fallback blocks in resolveSpecialTokens
  with table-driven loop
@ajroetker ajroetker requested a review from janpfeifer March 16, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants