Validation and Evaluation

torchtitan provides direct and indirect support for validation to support user's training goals. Direct support is provided by the Validator class which interacts directly with the training loop, and indirect support is provided through HuggingFace checkpoint conversion for users who want to do evaluation using external tools such as ELeutherAI's lm_eval.

Validation

For users who want to perform validation directly during the training loop, we provide the Validator class which can be conveniently overloaded through the TrainSpec or configured in JobConfig. The validator class has access to and reuses many of the trainer's functions such as its parallelization, including pipelining.

Below is an example validation config:

[validation]
enable = true
dataset = "c4_validation"
freq = 500
steps = -1 # consumes the entire validation set

Third-Party Evaluation

With ./scripts/checkpoint_conversion/convert_to_hf.py, torchtitan offers support for converting checkpoints from DCP to safetensors format. Using this script, users can perform efficient evaluation separate from their training using external libraries that support HuggingFace e.g. lm_eval with vllm backend.

Example usage of `lm_eval` with `vllm`:

To use this specific setup make sure to include a HuggingFace config.json file which is not provided by conversion script or last_save_in_hf option. The HF config file can be downloaded by running python ./scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.1-8B --assets config.

Note that pip installing lm-eval may result in breaking torchtitan dev environment so we recommend creating a separate env.

pip install "lm-eval[vllm]"
lm_eval --model vllm \
    --model_args pretrained=./outputs/checkpoint/step-1000,tensor_parallel_size=8,dtype=auto,gpu_memory_utilization=0.8, \
    --tasks mmlu \
    --batch_size auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6209	±	0.0038
- humanities	2	none	acc	↑	0.5481	±	0.0066
- other	2	none	acc	↑	0.7045	±	0.0078
- social sciences	2	none	acc	↑	0.7351	±	0.0078
- stem	2	none	acc	↑	0.5357	±	0.0085

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation and Evaluation

Validation

Third-Party Evaluation

Example usage of `lm_eval` with `vllm`:

FilesExpand file tree

evaluation.md

Latest commit

History

evaluation.md

File metadata and controls

Validation and Evaluation

Validation

Third-Party Evaluation

Example usage of lm_eval with vllm:

Example usage of `lm_eval` with `vllm`: