Skip to content

zs1314/AHD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models

arXiv License

This is the official implementation of AHD (Anchor-based History-stable Decoding), a training-free, plug-and-play dynamic decoding strategy for Diffusion Large Language Models (dLLMs).


πŸ“‘ Table of Contents


πŸ“ Project Structure

open-dLLM-compress/
β”œβ”€β”€ llada/                  # AHD on LLaDA-8B-Instruct
β”‚   β”œβ”€β”€ generate_AHD_acc.py # AHD decoding implementation
β”‚   β”œβ”€β”€ generate.py         # Baseline (Semi-AR) decoding
β”‚   β”œβ”€β”€ eval_llada.py       # Evaluation harness wrapper
β”‚   β”œβ”€β”€ eval_*.sh           # Evaluation scripts for each benchmark
β”‚   └── model/              # LLaDA model definition
β”œβ”€β”€ llada1.5/               # AHD on LLaDA-1.5 (same structure as llada/)
β”œβ”€β”€ MMADA/                  # AHD on MMaDA (vision-language)
β”‚   β”œβ”€β”€ models/             # MMaDA model with AHD integration
β”‚   β”œβ”€β”€ scripts/            # Evaluation scripts
β”‚   β”œβ”€β”€ lmms_eval/          # lmms-eval framework
β”‚   └── generate_demo.py    # Quick demo
β”œβ”€β”€ DIFFA/                  # AHD on DIFFA (audio-language)
β”‚   β”œβ”€β”€ src/                # DIFFA model and AHD audio decoding
β”‚   β”œβ”€β”€ inference_voicebench.py
β”‚   └── voicebench/         # VoiceBench evaluation
β”œβ”€β”€ assets/                 # Figures
β”œβ”€β”€ LICENSE
└── README.md

πŸ§ͺ Evaluation of AHD on LLaDA & LLaDA-1.5

βš™οΈ Models

Model Name Hugging Face Repo Local Path
LLaDA-8B-Instruct GSAI-ML/LLaDA-8B-Instruct ./Models/LLaDA-8B-Instruct/
LLaDA-1.5 GSAI-ML/LLaDA-1.5 ./Models/LLaDA-1.5/

πŸ“¦ Dependencies

cd llada  # or cd llada1.5
conda create -n llada python=3.12
conda activate llada
pip install -r requirements.txt

πŸ”§ Quick Demo

Please make sure to set the correct model path in generate_AHD_acc.py.

python generate_AHD_acc.py

πŸ”¨ Evaluation

Supported Benchmarks

Benchmark Script Few-shot
BBH eval_bbh.sh 3
MMLU-Pro eval_mmlu_pro.sh 0
HumanEval eval_humaneval.sh 0
MBPP eval_mbpp.sh 3
MATH eval_math.sh 3
ASDiv eval_asdiv.sh 0
TruthfulQA eval_truthqa.sh 0
sh eval_bbh.sh
sh eval_mmlu_pro.sh
sh eval_humaneval.sh
sh eval_mbpp.sh
sh eval_math.sh
sh eval_asdiv.sh
sh eval_truthqa.sh

Tip

Each script contains both Baseline and AHD. You can configure length, block_length, num_fewshot, and AHD-specific hyperparameters (kl_threshold_AHD, history_length_AHD, etc.) directly in the scripts.

Note

HumanEval requires post-processing:

python postprocess_code.py {samples_xxx.jsonl}

πŸ§ͺ Evaluation of AHD on MMADA

βš™οΈ Models

Model Name Hugging Face Repo Local Path
MMaDA-8B-MixCoT Gen-Verse/MMaDA-8B-MixCoT ./Models/MMaDA-8B-MixCoT/

πŸ“¦ Dependencies

cd MMADA
conda create -n mmada python=3.11
conda activate mmada
pip install -r requirements.txt
cd lmms_eval
uv pip install -e .

πŸ”§ Quick Demo

Please make sure to set the correct model path in generate_demo.py.

python generate_demo.py

πŸ”‘ Environment Variable Configuration

Some evaluation tasks use an LLM as a judge (e.g., GPT). Please configure the following environment variables before running evaluation:

export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
export API_TYPE="openai"
export OPENAI_API_URL="https://api.openai.com/v1/chat/completions"

πŸ”¨ Evaluation

Supported Benchmarks

Benchmark Task Name
MathVista-mini mathvista_testmini_mmada
MathVision mathvision_test_mmada
ScienceQA-Img scienceqa_img_mmada
GQA gqa
MME mme
cd ..
bash scripts/eval_baseline.sh
bash scripts/eval_AHD.sh

Tip

You can configure the following hyperparameters in the scripts above: GEN_LENGTH, DIFF_STEP, BLOCK_LENGTH, NGPU.

Note

The default LLM-judge model used in this paper is gpt-4.1-mini.


πŸ§ͺ Evaluation of AHD on DIFFA

βš™οΈ Models

Model Name Hugging Face Repo Local Path
Whisper-Small openai/whisper-small ./DIFFA/whisper/
DIFFA zhoujiaming777/DIFFA ./DIFFA/checkpoint-diffa/
LLaDA-8B-Instruct GSAI-ML/LLaDA-8B-Instruct ./DIFFA/LLaDA-8B-Instruct/

πŸ“¦ Dependencies

cd DIFFA
conda create -n diffa python=3.10
conda activate diffa
pip install -r requirements.txt

πŸ” Inference

python inference_voicebench.py \
    --model_path path/to/DIFFA/checkpoint-diffa \
    --whisper_path path/to/DIFFA/whisper \
    --llm_path path/to/DIFFA/LLaDA-8B-Instruct \
    --data openbookqa \
    --generation_method AHD

Tip

  • Datasets: openbookqa, bbh, alpacaeval, wildvoice, commoneval
  • Methods: Vanilla, AHD
  • Key arguments: --steps, --block_length, --max_new_tokens

πŸ”¨ Evaluation

Follow the evaluation method from VoiceBench:

cd voicebench
python evaluate.py --src_file {result.jsonl} --evaluator xx

πŸ“‘ Todo List

  • Re-architect the codebase
  • Support multi-batch-size
  • Support implementations of other method

πŸŽ“ Citation

If you find this work helpful for your research, please consider citing:

@article{zou2026ahd,
      title={Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models}, 
      author={Shun Zou and Yong Wang and Zehui Chen and Lin Chen and Chongyang Tao and Feng Zhao and Xiangxiang Chu},
      journal={arXiv preprint arXiv:2604.08964},
      year={2026}
}

πŸ™ Acknowledgement

We would like to thank the authors of the following projects for their excellent work and open-source contributions:

About

[ACL 2026] Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages