torchtitan provides direct and indirect support for validation to support user's training goals. Direct support is provided by the Validator class which interacts directly with the training loop, and indirect support is provided through HuggingFace checkpoint conversion for users who want to do evaluation using external tools such as ELeutherAI's lm_eval.
For users who want to perform validation directly during the training loop, we provide the Validator class which can be conveniently overloaded through the TrainSpec or configured in JobConfig. The validator class has access to and reuses many of the trainer's functions such as its parallelization, including pipelining.
Below is an example validation config:
[validation]
enable = true
dataset = "c4_validation"
freq = 500
steps = -1 # consumes the entire validation setWith ./scripts/checkpoint_conversion/convert_to_hf.py, torchtitan offers support for converting checkpoints from DCP to safetensors format. Using this script, users can perform efficient evaluation separate from their training using external libraries that support HuggingFace e.g. lm_eval with vllm backend.
To use this specific setup make sure to include a HuggingFace config.json file which is not provided by conversion script or last_save_in_hf option. The HF config file can be downloaded by running python ./scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.1-8B --assets config.
Note that pip installing lm-eval may result in breaking torchtitan dev environment so we recommend creating a separate env.
pip install "lm-eval[vllm]"
lm_eval --model vllm \
--model_args pretrained=./outputs/checkpoint/step-1000,tensor_parallel_size=8,dtype=auto,gpu_memory_utilization=0.8, \
--tasks mmlu \
--batch_size auto| Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mmlu | 2 | none | acc | ↑ | 0.6209 | ± | 0.0038 | |
| - humanities | 2 | none | acc | ↑ | 0.5481 | ± | 0.0066 | |
| - other | 2 | none | acc | ↑ | 0.7045 | ± | 0.0078 | |
| - social sciences | 2 | none | acc | ↑ | 0.7351 | ± | 0.0078 | |
| - stem | 2 | none | acc | ↑ | 0.5357 | ± | 0.0085 |