Fine-Tuning Language Models with LoRA on Kaggle

This repository contains a Jupyter notebook for fine-tuning language models using LoRA (Low-Rank Adaptation) and 4-bit quantization techniques optimized for Kaggle's free tier GPU environment. The implementation focuses on memory-efficient training while maintaining model performance.

Features

4-bit Quantization: Uses BitsAndBytes for memory-efficient model loading
LoRA Fine-tuning: Parameter-efficient fine-tuning with PEFT library
Kaggle Optimized: Training arguments optimized for Kaggle's free tier (T4 GPU)
Memory Efficient: Gradient accumulation and batch size optimization
Clean Implementation: Well-documented notebook with step-by-step process

Requirements

Python 3.8+
CUDA-compatible GPU (T4 recommended, as available on Kaggle)
All dependencies listed in requirements.txt

Kaggle Usage

Upload to Kaggle:
- Upload FineTunning.ipynb to a new Kaggle notebook
- Or fork this repository directly in Kaggle
Setup Dataset:
- Upload your training dataset to Kaggle Datasets
- Update the dataset path in the first cell:
```
with open("/kaggle/input/YOUR-DATASET-NAME/your-file.json", "r") as f:
```
Configure Model:
- Replace "YOUR MODEL ID" with your desired model (e.g., "microsoft/DialoGPT-medium")
- Add your Hugging Face token:
```
login(token="your_actual_huggingface_token")
```
Run the Notebook:
- Execute cells sequentially
- Training will take ~30-40 minutes on Kaggle's free tier
- Model will be saved as LoRA adapters for lightweight storage

Local Usage (Alternative)

Clone the repository:

git clone https://github.com/Abhijeet-ist/FineTunning.git
cd FineTunning

Install dependencies:
```
pip install -r requirements.txt
```
Run the Jupyter notebook:
```
jupyter notebook FineTunning.ipynb
```

Training Configuration

The notebook is optimized for Kaggle's constraints:

Batch size: 1 with gradient accumulation steps of 4
Max steps: 500 (prevents timeout)
Mixed precision: FP16 enabled for memory savings
Model saving: Only LoRA adapters saved (lightweight)

Dataset Format

Your dataset should be in JSON format with instruction-response pairs:

[
  {
    "instruction": "Your training instruction here",
    "response": "Expected model response here"
  },
  ...
]

Key Libraries Used

transformers: Model loading and training
peft: LoRA implementation
bitsandbytes: 4-bit quantization
datasets: Data handling and preprocessing
accelerate: Training optimization

Tips for Kaggle

Enable GPU in notebook settings
Use datasets feature to upload training data
Monitor GPU memory usage during training
Save outputs regularly to prevent data loss

Contributing

Feel free to submit issues and enhancement requests!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
FineTunning.ipynb		FineTunning.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning Language Models with LoRA on Kaggle

Features

Requirements

Kaggle Usage

Local Usage (Alternative)

Training Configuration

Dataset Format

Key Libraries Used

Tips for Kaggle

Contributing

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning Language Models with LoRA on Kaggle

Features

Requirements

Kaggle Usage

Local Usage (Alternative)

Training Configuration

Dataset Format

Key Libraries Used

Tips for Kaggle

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages