MatGPTQ

MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization

Official implementation of MatGPTQ (Matryoshka GPTQ), a new PTQ pipeline that produces a single parent model jointly optimized for multiple target precisions in one-shot, based on a small calibration set.

Abstract

Matryoshka Quantization (MatQuant) is a recent quantization approach showing that a single integer-quantized model can be served across multiple precisions, by slicing the most significant bits (MSB) at inference time. This enables a single checkpoint to cover a wide range of memory and latency budgets, but renders quantization much more challenging. In particular, the initial MatQuant relies on expensive quantization-aware training (QAT) variants, rather than fast one–shot post training quantization (PTQ), and lacks open-source and kernel support. We address all of these limitations by introducing Post-Training Matryoshka Quantization (MatGPTQ), a new PTQ pipeline that produces a single parent model jointly optimized for multiple target precisions in one-shot, based on a small calibration set. MatGPTQ casts Matryoshka quantization as a multi–precision objective with bit-slicing and cross–bit error compensation, resulting in an algorithm that produces a multi-bit-width, "sliceable" model in a single pass. We also incorporate a new budget–aware search for heterogeneous per–layer bit-witdhs and provide efficient kernels that implement slicing and mixed–precision execution. Across standard LLMs and benchmarks, MatGPTQ preserves high–bit accuracy while substantially improving performance at low-bit-witdh settings. Overall, we establish a new state of the art for Matryoshka–style post–training quantization and make single–checkpoint, multi–precision deployment open and practical.

Repository structure

scripts/ — contains bash scripts with the required arguments to run the method
src/ — directory for helper methods and utility functions
evo_quant_search.py — evolutionary quantization bitwidth allocation
quant.py — MatGPTQ/GPTQ quantization
lmeval.py — LM Eval Harness evalution script
eval_ppl.py — perplexity evalution script

Installation

Create a virtual environment and install dependencies (we recommend Python 3.12):

uv venv --python 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

Note: The code has been tested with CUDA 12.4 and PyTorch 2.7.1

Quantization

We provide quant.py for producing the MatGPTQ/GPTQ models. To produce the respective model see either scripts/run_gptq.sh or scripts/run_matgptq.sh for examples on how to run quantized training:

bash scripts/run_matgptq.sh

Mix'n'Match

We provide evo_quant_search.py for producing the Mix'n'Match MatGPTQ models. To produce the respective model see scripts/run_quant_search.sh for an example on how to run EvoPress for MatGPTQ:

bash scripts/run_quant_search.sh

Evaluations

We provide lmeval.py and eval_ppl.py scripts for evaluation on the Language Model Evaluation Harness benchmarks and perplexity measurements. The interface of lmeval.py mostly follows the instructions from the original. In addition, one should specify the path to quantized weights via the quant_weights_path argument and the default uniform quantization bitwidth quant_uniform_bitwidth and master bitwidth --quant_master_bitwidth, or a path to a .txt file with chosen compression levels via the --quant_non_uniform_config_path argument. Furthermore, with --method, you define whether to evaluate MatGPTQ or GPTQ.

Deployment

Work In Progress

Citation

If you use MatGPTQ in your research, please cite:

@misc{kleinegger2026matgptqaccurateefficientposttraining,
      title={MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization}, 
      author={Maximilian Kleinegger and Elvir Crnčević and Dan Alistarh},
      year={2026},
      eprint={2602.03537},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.03537}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatGPTQ

Abstract

Repository structure

Installation

Quantization

Mix'n'Match

Evaluations

Deployment

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_ppl.py		eval_ppl.py
evo_quant_search.py		evo_quant_search.py
lmeval.py		lmeval.py
quant.py		quant.py
requirements.txt		requirements.txt

License

IST-DASLab/MatGPTQ

Folders and files

Latest commit

History

Repository files navigation

MatGPTQ

Abstract

Repository structure

Installation

Quantization

Mix'n'Match

Evaluations

Deployment

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages