This is the official implementation of AHD (Anchor-based History-stable Decoding), a training-free, plug-and-play dynamic decoding strategy for Diffusion Large Language Models (dLLMs).
- Project Structure
- Evaluation of AHD on LLaDA & LLaDA-1.5
- Evaluation of AHD on MMADA
- Evaluation of AHD on DIFFA
- Todo List
- Citation
- Acknowledgement
open-dLLM-compress/
βββ llada/ # AHD on LLaDA-8B-Instruct
β βββ generate_AHD_acc.py # AHD decoding implementation
β βββ generate.py # Baseline (Semi-AR) decoding
β βββ eval_llada.py # Evaluation harness wrapper
β βββ eval_*.sh # Evaluation scripts for each benchmark
β βββ model/ # LLaDA model definition
βββ llada1.5/ # AHD on LLaDA-1.5 (same structure as llada/)
βββ MMADA/ # AHD on MMaDA (vision-language)
β βββ models/ # MMaDA model with AHD integration
β βββ scripts/ # Evaluation scripts
β βββ lmms_eval/ # lmms-eval framework
β βββ generate_demo.py # Quick demo
βββ DIFFA/ # AHD on DIFFA (audio-language)
β βββ src/ # DIFFA model and AHD audio decoding
β βββ inference_voicebench.py
β βββ voicebench/ # VoiceBench evaluation
βββ assets/ # Figures
βββ LICENSE
βββ README.md
| Model Name | Hugging Face Repo | Local Path |
|---|---|---|
| LLaDA-8B-Instruct | GSAI-ML/LLaDA-8B-Instruct |
./Models/LLaDA-8B-Instruct/ |
| LLaDA-1.5 | GSAI-ML/LLaDA-1.5 |
./Models/LLaDA-1.5/ |
cd llada # or cd llada1.5
conda create -n llada python=3.12
conda activate llada
pip install -r requirements.txtPlease make sure to set the correct model path in
generate_AHD_acc.py.
python generate_AHD_acc.py| Benchmark | Script | Few-shot |
|---|---|---|
| BBH | eval_bbh.sh |
3 |
| MMLU-Pro | eval_mmlu_pro.sh |
0 |
| HumanEval | eval_humaneval.sh |
0 |
| MBPP | eval_mbpp.sh |
3 |
| MATH | eval_math.sh |
3 |
| ASDiv | eval_asdiv.sh |
0 |
| TruthfulQA | eval_truthqa.sh |
0 |
sh eval_bbh.sh
sh eval_mmlu_pro.sh
sh eval_humaneval.sh
sh eval_mbpp.sh
sh eval_math.sh
sh eval_asdiv.sh
sh eval_truthqa.shTip
Each script contains both Baseline and AHD. You can configure length, block_length, num_fewshot, and AHD-specific hyperparameters (kl_threshold_AHD, history_length_AHD, etc.) directly in the scripts.
Note
HumanEval requires post-processing:
python postprocess_code.py {samples_xxx.jsonl}| Model Name | Hugging Face Repo | Local Path |
|---|---|---|
| MMaDA-8B-MixCoT | Gen-Verse/MMaDA-8B-MixCoT |
./Models/MMaDA-8B-MixCoT/ |
cd MMADA
conda create -n mmada python=3.11
conda activate mmada
pip install -r requirements.txtcd lmms_eval
uv pip install -e .Please make sure to set the correct model path in
generate_demo.py.
python generate_demo.pySome evaluation tasks use an LLM as a judge (e.g., GPT). Please configure the following environment variables before running evaluation:
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
export API_TYPE="openai"
export OPENAI_API_URL="https://api.openai.com/v1/chat/completions"| Benchmark | Task Name |
|---|---|
| MathVista-mini | mathvista_testmini_mmada |
| MathVision | mathvision_test_mmada |
| ScienceQA-Img | scienceqa_img_mmada |
| GQA | gqa |
| MME | mme |
cd ..
bash scripts/eval_baseline.sh
bash scripts/eval_AHD.shTip
You can configure the following hyperparameters in the scripts above: GEN_LENGTH, DIFF_STEP, BLOCK_LENGTH, NGPU.
Note
The default LLM-judge model used in this paper is gpt-4.1-mini.
| Model Name | Hugging Face Repo | Local Path |
|---|---|---|
| Whisper-Small | openai/whisper-small |
./DIFFA/whisper/ |
| DIFFA | zhoujiaming777/DIFFA |
./DIFFA/checkpoint-diffa/ |
| LLaDA-8B-Instruct | GSAI-ML/LLaDA-8B-Instruct |
./DIFFA/LLaDA-8B-Instruct/ |
cd DIFFA
conda create -n diffa python=3.10
conda activate diffa
pip install -r requirements.txtpython inference_voicebench.py \
--model_path path/to/DIFFA/checkpoint-diffa \
--whisper_path path/to/DIFFA/whisper \
--llm_path path/to/DIFFA/LLaDA-8B-Instruct \
--data openbookqa \
--generation_method AHDTip
- Datasets:
openbookqa,bbh,alpacaeval,wildvoice,commoneval - Methods:
Vanilla,AHD - Key arguments:
--steps,--block_length,--max_new_tokens
Follow the evaluation method from VoiceBench:
cd voicebench
python evaluate.py --src_file {result.jsonl} --evaluator xx- Re-architect the codebase
- Support multi-batch-size
- Support implementations of other method
If you find this work helpful for your research, please consider citing:
@article{zou2026ahd,
title={Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models},
author={Shun Zou and Yong Wang and Zehui Chen and Lin Chen and Chongyang Tao and Feng Zhao and Xiangxiang Chu},
journal={arXiv preprint arXiv:2604.08964},
year={2026}
}We would like to thank the authors of the following projects for their excellent work and open-source contributions:
