Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights by ajroetker · Pull Request #18 · gomlx/go-huggingface

ajroetker · 2026-03-10T04:28:38Z

Add Model.GetTensorRaw() to load raw tensor bytes without dequantization, keeping quantized weights in their native GGML block format
Add TensorType.ToGGMLQuantType() mapping GGUF tensor types to GoMLX GGMLQuantType (Q4_0, Q8_0, IQ4_NL, Q2_K–Q6_K)
Update go.mod dependencies

Dependencies

Depends on ajroetker/gomlx#6 — the go.mod replace directive will be updated to a proper module version once that PR merges.

- Add Model.GetTensorRaw() to load raw tensor bytes without dequantization, enabling quantized weights to stay in native format - Add TensorType.ToGGMLQuantType() mapping GGUF tensor types to GoMLX GGMLQuantType (Q4_0, Q8_0, IQ4_NL, Q2_K–Q6_K) - Update go.mod dependencies Depends on: ajroetker/gomlx#6

- Multi-file GGUF: support multimodal models (e.g. LLaVA) that split tensors across multiple GGUF files via LoadAll/LoadFiles/ListGGUFFiles. Tensors are looked up across all files transparently. - Pair extra files with their readers in a single extraEntry struct to prevent parallel-slice divergence and enable per-file lazy reader init. - Fix Close() to close all readers instead of leaking on first error. - Deduplicate .gguf filename filtering into a shared ggufFileNames helper. - Add FlexToken type to handle HuggingFace token config fields that can be either a plain string or an object with a "content" field. - Update go-xla, golang.org/x/sys, k8s.io/klog dependencies.

models/gguf/model.go

…p FlexToken - Rename GetTensor→ReadTensor, GetTensorRaw→ReadTensorBytes per reviewer convention (Read prefix for I/O methods) - Merge findTensorFile+fileForIndex into findTensor to avoid double map lookup in GetTensorInfo and readerForTensor - Return error instead of silently swallowing invalid JSON in FlexToken.UnmarshalJSON - Remove unused FlexToken.String() method - Replace 7 copy-pasted config fallback blocks in resolveSpecialTokens with table-driven loop

ajroetker added 3 commits March 9, 2026 21:26

Add ExtraFiles accessor for additional GGUF files

d8d20f2

janpfeifer requested changes Mar 13, 2026

View reviewed changes

models/gguf/model.go Outdated Show resolved Hide resolved

ajroetker added 2 commits March 16, 2026 11:33

Merge remote-tracking branch 'upstream/main' into gguf-quantized-types

c532fa4

ajroetker requested a review from janpfeifer March 16, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights#18

Add GetTensorRaw and ToGGMLQuantType for GGML quantized weights#18
ajroetker wants to merge 5 commits intogomlx:mainfrom
ajroetker:gguf-quantized-types

ajroetker commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ajroetker commented Mar 10, 2026

Dependencies

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants