This codebase is designed around HuggingFace's (HF) transformers and datasets libraries. It is designed to be modular but with low-coupling, so that it is easy to extend for different kinds of experiments around LLMs.
The three basic components are models, trainers and evaluators, all of which are meant to be very loosely coupled, e.g. no back and forth communication between them. Some projects may require the implementation of a new type of model, or a new type of trainer, or a new type of evaluator, each of which can go from thinly wrapping external resources, e.g. a HF trainer, to a manually written component in PyTorch. Aside from these basic components, logging and config file management are also supported and should not require any changes for new projects.
This is work in progress.
We strongly recommend using UV for all things Python. If you do, installing this code is as simple as follows:
uv sync
There are a few examples of config files for training and evaluating models in the examples folder. For example, here's how to train a model with a HF trainer using adapters:
python run.py --config examples/examples_train_hf_peft.yaml
Each job creates a directory to store model checkpoints, logs, etc. and is managed by a YAML config file where all job settings are defined. The main components of every job are defined as top-level config keys in the config file.
All models inherit from HF's PreTrainedModel, so that they can be used with any library that supports HF models. A very thin wrapper is required for using pre-trained models from the Hub, but custom models can also be created so they are widely supported as any HuggingFace model is. For an example of a custom model, see tuned lens model.
All trainers are meant to take as input a model and some configuration settings for how to train the given model. A trainer can be implemented based on any external library. For example, see LightningTrainer for a trainer based on Lightning.
Similar to trainers, all evaluators are meant to take in a model and some configuration settings for how to evaluate the given model. These too can be implemented using any external library, e.g. the Language Model Evaluation Harness..
- For any bugs of feature requests, please make an issue.