GitHub - basetenlabs/ml-cookbook: Ready-to-use ML training recipes to help you build and deploy models on Baseten.

A curated collection of ready-to-use training recipes for machine learning on Baseten. Whether you’re starting from scratch or fine-tuning an existing model, these recipes provide practical, copy-paste solutions for every stage of your ML pipeline.

What's inside

Training recipes - End-to-end examples for training models from scratch
Fine-tuning workflows - Adapt pre-trained models to your specific use case
Best practices - Optimized configurations and common patterns

From data preprocessing to checkpointed and trained models, these recipes cover the complete ML lifecycle on Baseten's platform.

Prerequisites

Before getting started, ensure you have the following:

A Baseten account. Sign up here if you don't have one.
- Add any access tokens, API keys (example: Hugging Face access token), passwords to securely access credentials from your models in secrets.
- This is required to access models on Huggingface that have gated access. More information on setting up Huggingface access tokens can be found here.
Python 3.8 to 3.11 installed. Conda env recommended.

Run the examples

Install `truss`

Use the appropriate command for your package manager

# pip
pip install -U truss
# uv
uv add truss && uv sync --upgrade-package truss

Create the workspace for your training project

# for any example (replace with the specific example name)
truss train init --examples <example-name> && cd <example-name>

Kick off the job

Make sure you've plugged in proper secrets (e.g. Hugging Face token) via Baseten Secrets and Environment Variables, and kick off your job

truss train push config.py

For more details, take a look at the docs

If you'd like to fire off jobs from within this repository directly, you can clone the respository and navigagte to the approriate workspaces.

git clone https://github.com/basetenlabs/ml-cookbook.git

Usage

Examples vs Recipes

examples/ are runnable, model/framework-specific projects you can launch directly with truss train push config.py.

recipes/ are reusable implementation guides and patterns that help you choose an approach and adapt it to your own project.

Programmatic Training API

The Programmatic Training API lets you launch and manage machine learning training jobs directly from your Python code, rather than relying solely on CLI commands or configuration files.

recipes/programmatic-training-api/README.md

Long-context SFT

"Long-context supervised fine-tuning (SFT)" refers to adapting large language models to handle and learn from sequences with a much greater length than standard context windows. This enables models to process, reason about, and generate long-form documents, conversations, or codebases in a single pass.

This example demonstrates how to set up a supervised fine-tuning project targeting long-context models.

For detailed instructions and code, see recipes/sft/long_context/README.md.

recipes/sft/long_context/README.md

Fine-tune GPT OSS 20B with LoRa and trl

If using a model with gated access, make sure you have access to the model on HuggingFace and your API token uploaded to your secrets. This example requires an HF access token.

Training

examples/oss-gpt-20b-lora/training/train.py contains all training code.

examples/oss-gpt-20b-lora/training/config.py will be the entry point to start training, where you can define your training configuration. This also includes the start commands to launch your training job. Make sure these commands also include any file permission changes to make shell scripts run. We do not change any file system permissions.

Make sure to update hf_access_token in config.py with the same name for this access token saved in your secrets. In this example, we will be writing trained checkpoints directly to Huggingface, the Hub IDs for models and datasets are configured in examples/oss-gpt-20b-lora/training/run.sh. Update run.sh with a repo you have access to write to.

cd examples/oss-gpt-20b-lora/training
truss train push config.py

Upon successful submission, the CLI will output helpful information about your job:

✨ Training job successfully created!
🪵 View logs for your job via `truss train logs --job-id e3m512w [--tail]`
🔍 View metrics for your job via `truss train metrics --job-id e3m512w`

Keep the Job ID handy, as you’ll use it for managing and monitoring your job.

Alternatively, you can view all your training jobs at (https://app.baseten.co/training/)[https://app.baseten.co/training/].

As checkpoints are generated, you can access them on Huggingface at the same location defined in run.sh.

Fine-tune Qwen3 8B with LoRa and trl

If using a model with gated access, make sure you have access to the model on HuggingFace and your API token uploaded to your secrets.

Training

examples/qwen3-8b-lora-dpo-trl/training/train.py contains the training code.

examples/qwen3-8b-lora-dpo-trl/training/config.py will be the entry point to start training, where you can define your training configuration. This also includes the start commands to launch your training job. Make sure these commands also include any file permission changes to make shell scripts run. We do not change any file system permissions.

cd examples/qwen3-8b-lora-dpo-trl/training
truss train push config.py

Upon successful submission, the CLI will output helpful information about your job:

✨ Training job successfully created!
🪵 View logs for your job via `truss train logs --job-id e3m512w [--tail]`
🔍 View metrics for your job via `truss train metrics --job-id e3m512w`

Alternatively, you can view all your training jobs at (https://app.baseten.co/training/)[https://app.baseten.co/training/].

In this example, since checkpointing is enabled in config.py, checkpoints are stored in cloud storage and can be accessed with

truss train get_checkpoint_urls --job-id $JOB_ID

Train and deploy an MNIST digit classifier with Pytorch

Training

examples/mnist-single-gpu/training/train_mnist.py contains the a Pytorch example of an MNIST classifier with CNNs.

examples/mnist-single-gpu/training/config.py will be the entry point to start training, where you can define your training configuration. This also includes the start commands to launch your training job. Make sure these commands also include any file permission changes to make shell scripts run. We do not change any file system permissions.

cd examples/mnist-single-gpu/training
truss train push config.py

Upon successful submission, the CLI will output helpful information about your job:

✨ Training job successfully created!
🪵 View logs for your job via `truss train logs --job-id e3m512w [--tail]`
🔍 View metrics for your job via `truss train metrics --job-id e3m512w`

Keep the Job ID handy, as you’ll use it for managing and monitoring your job.

In this example, since checkpointing is enabled in config.py, checkpoints are stored in cloud storage and can be accessed with

truss train get_checkpoint_urls --job-id $JOB_ID

Contributing

Contributions are welcome! Please open issues or submit pull requests.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github/workflows		.github/workflows
bin		bin
examples		examples
images		images
recipes		recipes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What's inside

Table of contents

Prerequisites

Run the examples

Install `truss`

Create the workspace for your training project

Kick off the job

If you'd like to fire off jobs from within this repository directly, you can clone the respository and navigagte to the approriate workspaces.

Usage

Examples vs Recipes

Programmatic Training API

Long-context SFT

Fine-tune GPT OSS 20B with LoRa and trl

Training

Fine-tune Qwen3 8B with LoRa and trl

Training

Train and deploy an MNIST digit classifier with Pytorch

Training

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What's inside

Table of contents

Prerequisites

Run the examples

Install truss

Create the workspace for your training project

Kick off the job

If you'd like to fire off jobs from within this repository directly, you can clone the respository and navigagte to the approriate workspaces.

Usage

Examples vs Recipes

Programmatic Training API

Long-context SFT

Fine-tune GPT OSS 20B with LoRa and trl

Training

Fine-tune Qwen3 8B with LoRa and trl

Training

Train and deploy an MNIST digit classifier with Pytorch

Training

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Install `truss`

Packages