Open-Source AI Cookbook documentation
20x Faster TRL Fine-tuning with RapidFire AI
Join Discord if you need help + ⭐ Star us on GitHub ⭐
👉 Note: This Colab notebook illustrates simplified usage of
rapidfireai. For the full RapidFire AI experience with advanced experiment manager, UI, and production features, see our Install and Get Started guide.
🎬 Watch our intro video to get started!
⚠️ Important: Avoid leaving this Colab tab idle for more than 5 minutes—Colab may disconnect. To stay connected, periodically refresh TensorBoard or run a cell.
20x Faster TRL Fine-tuning with RapidFire AI
Authored by: RapidFire AI Team
This cookbook demonstrates how to fine-tune LLMs using Supervised Fine-Tuning (SFT) with RapidFire AI, enabling you to train and compare multiple configurations concurrently—even on a single GPU. We’ll build a customer support chatbot and explore how RapidFire AI’s chunk-based scheduling delivers 16-24× faster experimentation throughput.
What You’ll Learn:
- Concurrent LLM Experimentation: How to define and run multiple SFT experiments concurrently
- LoRA Fine-tuning: Using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters of different capacities
- Experiment Tracking: TensorBoard logging and real-time dashboard monitoring
- Interactive Control Operations (IC Ops): Using Stop, Resume, Clone-Modify, and Delete to manage runs mid-training
Key Benefits of RapidFire AI:
- ⚡ 16-24× Speedup: Compare multiple configurations in the time it takes to run one sequentially
- 🎯 Early Signals: Get comparative metrics after the first data chunk instead of waiting for full training
- 🔧 Drop-in Integration: Uses familiar TRL/Transformers APIs with minimal code changes
- 📊 Real-time Monitoring and Control: Live dashboard with IC Ops (Stop, Resume, Clone-Modify, and Delete) on active runs
What We’re Building
In this tutorial, we’ll fine-tune a customer support chatbot that can answer user queries in a helpful and friendly manner. We’ll use the Bitext Customer Support dataset, which contains instruction-response pairs covering common customer support scenarios—each example includes a user question and an ideal assistant response.
Our Approach
We’ll use Supervised Fine-Tuning (SFT) with LoRA (Low-Rank Adaptation) to efficiently adapt a pre-trained LLM (GPT-2) for customer support tasks. To find the best hyperparameters, we’ll compare 4 configurations simultaneously:
- 2 LoRA adapter sizes: Small (rank 8) vs. Large (rank 32)
- 2 learning rates: 5e-4 vs. 2e-4
RapidFire AI’s chunk-based scheduling trains all configurations concurrently—processing the dataset in chunks and letting every run train on each chunk before moving to the next. This gives you comparative metrics early, so you can identify the best configuration without waiting for all training to complete.
The figure below illustrates this concept with 3 configurations (M1, M2, M3). Sequential training completes one configuration entirely before starting the next. RapidFire AI interleaves all configurations, training each on one data chunk before rotating to the next. The bottom row shows how IC Ops let you adapt mid-training—stopping underperformers and cloning promising runs. Our tutorial uses 4 configurations, but the scheduling principle is the same.
Sequential vs. RapidFire AI on a single GPU with chunk-based scheduling and IC Ops.
Install RapidFire AI Package and Setup
Option 1: Run Locally (or on a VM)
For the full RapidFire AI experience—advanced experiment management, UI, and production features—we recommend installing the package on a machine you control (for example, a VM or your local machine) rather than Google Colab. See our Install and Get Started guide.
Option 2: Run in Google Colab
For simplicity, you can run this notebook on Google Colab. This notebook is configured to run end-to-end on Colab with no local installation required.
try:
import rapidfireai
print("✅ rapidfireai already installed")
except ImportError:
%pip install rapidfireai # Takes ~1 min
!rapidfireai init # Takes ~1 minStart RapidFire Services
Start the RapidFire AI services:
import subprocess
from time import sleep
import socket
try:
s = [socket.socket(socket.AF_INET, socket.SOCK_STREAM), socket.socket(socket.AF_INET, socket.SOCK_STREAM), socket.socket(socket.AF_INET, socket.SOCK_STREAM)]
s[0].connect(("127.0.0.1", 8851))
s[1].connect(("127.0.0.1", 8852))
s[2].connect(("127.0.0.1", 8853))
s[0].close()
s[1].close()
s[2].close()
print("RapidFire Services are running")
except OSError as error:
print("RapidFire Services are not running, launching now...")
subprocess.Popen(["rapidfireai", "start"])
sleep(30)Note: You can also run rapidfireai start from the Colab terminal instead of the cell above.
Configure RapidFire to Use TensorBoard
import os
<CopyLLMTxtMenu containerStyle="float: right; margin-left: 10px; display: inline-flex; position: relative; z-index: 10;"></CopyLLMTxtMenu>
# Load TensorBoard extension
%load_ext tensorboard
# Configure RapidFire to use TensorBoard
os.environ['RF_TRACKING_BACKEND'] = 'tensorboard'
# TensorBoard log directory will be auto-created in experiment pathImport RapidFire Components
from rapidfireai import Experiment
from rapidfireai.fit.automl import List, RFGridSearch, RFModelConfig, RFLoraConfig, RFSFTConfig
# If you get "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'" from Colab, just rerun this cellLoad and Prepare the Dataset
We’ll use the Bitext Customer Support dataset, which contains instruction-response pairs for training customer support chatbots.
from datasets import load_dataset
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")
# REDUCED dataset for memory constraints in Colab
train_dataset = dataset["train"].select(range(64)) # Reduced from 128
eval_dataset = dataset["train"].select(range(50, 60)) # 10 examples
train_dataset = train_dataset.shuffle(seed=42)
eval_dataset = eval_dataset.shuffle(seed=42)Define Data Processing Function
We’ll format the data as Q&A pairs for GPT-2:
def sample_formatting_function(example):
"""Format the dataset for GPT-2 while preserving original fields"""
return {
"text": f"Question: {example['instruction']}\nAnswer: {example['response']}",
"instruction": example['instruction'], # Keep original
"response": example['response'] # Keep original
}
# Apply formatting to datasets
eval_dataset = eval_dataset.map(sample_formatting_function)
train_dataset = train_dataset.map(sample_formatting_function)Define Metrics Function
We’ll use a lightweight metrics computation with just ROUGE-L to save memory:
def sample_compute_metrics(eval_preds):
"""Lightweight metrics computation"""
predictions, labels = eval_preds
try:
import evaluate
# Only compute ROUGE-L (skip BLEU to save memory)
rouge = evaluate.load("rouge")
rouge_output = rouge.compute(
predictions=predictions,
references=labels,
use_stemmer=True,
rouge_types=["rougeL"] # Only compute rougeL
)
return {
"rougeL": round(rouge_output["rougeL"], 4),
}
except Exception as e:
# Fallback if metrics fail
print(f"Metrics computation failed: {e}")
return {}Initialize Experiment
# Create experiment with unique name
my_experiment = "tensorboard-demo-1"
experiment = Experiment(experiment_name=my_experiment)Get TensorBoard Log Directory
The TensorBoard logs are stored in the experiment directory. Let’s get the path:
# Get experiment path
from rapidfireai.fit.db.rf_db import RfDb
db = RfDb()
experiment_path = db.get_experiments_path(my_experiment)
tensorboard_log_dir = f"{experiment_path}/{my_experiment}/tensorboard_logs"
print(f"TensorBoard logs will be saved to: {tensorboard_log_dir}")Define Model Configurations
We’ll use RFGridSearch to create a grid of all possible combinations from our configurations. This tutorial uses GPT-2 (124M parameters), which fits comfortably within Colab’s memory constraints.
Our config group combines 2 LoRA adapters (small: r=8 targeting c_attn; large: r=32 targeting c_attn + c_proj) with 2 training strategies (Config A: lr=5e-4, linear scheduler; Config B: lr=2e-4, cosine scheduler with warmup). This produces the following 4 concurrent runs:
| Run | Base Model | Learning Rate | Scheduler | LoRA Rank | Target Modules |
|---|---|---|---|---|---|
| 1 | gpt2 | 5e-4 | linear | 8 | c_attn |
| 2 | gpt2 | 5e-4 | linear | 32 | c_attn, c_proj |
| 3 | gpt2 | 2e-4 | cosine | 8 | c_attn |
| 4 | gpt2 | 2e-4 | cosine | 32 | c_attn, c_proj |
RapidFire AI trains all 4 configurations concurrently using chunk-based scheduling, giving you comparative metrics early so you can identify the best hyperparameters faster.
# GPT-2 specific LoRA configs - different module names!
peft_configs_lite = List([
RFLoraConfig(
r=8,
lora_alpha=16,
lora_dropout=0.1,
target_modules=["c_attn"], # GPT-2 combines Q,K,V in c_attn
bias="none"
),
RFLoraConfig(
r=32,
lora_alpha=64,
lora_dropout=0.1,
target_modules=["c_attn", "c_proj"], # c_attn (QKV) + c_proj (output)
bias="none"
)
])
# 2 configs with GPT-2
config_set_lite = List([
RFModelConfig(
model_name="gpt2", # Only 124M params
peft_config=peft_configs_lite,
training_args=RFSFTConfig(
learning_rate=5e-4, # Low lr for more stability
lr_scheduler_type="linear",
per_device_train_batch_size=2,
gradient_accumulation_steps=2, # Effective bs = 4
max_steps=64, # Raise this to see more learning
logging_steps=2,
eval_strategy="steps",
eval_steps=4,
per_device_eval_batch_size=2,
fp16=True,
gradient_checkpointing=True, # Save memory
report_to="none", # Disables wandb
),
model_type="causal_lm",
model_kwargs={
"device_map": "auto",
"torch_dtype": "float16", # Explicit fp16
"use_cache": False
},
formatting_func=sample_formatting_function,
compute_metrics=sample_compute_metrics,
generation_config={
"max_new_tokens": 128, # Reduced from 256
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"repetition_penalty": 1.1,
"pad_token_id": 50256, # GPT-2's EOS token
}
),
RFModelConfig(
model_name="gpt2",
peft_config=peft_configs_lite,
training_args=RFSFTConfig(
learning_rate=2e-4, # Even more conservative
lr_scheduler_type="cosine", # Try cosine schedule
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
max_steps=64, # Increase to observe more learning behavior
logging_steps=2,
eval_strategy="steps",
eval_steps=4,
per_device_eval_batch_size=2,
fp16=True,
gradient_checkpointing=True,
report_to="none", # Disables wandb
warmup_steps=10, # Add warmup for stability
),
model_type="causal_lm",
model_kwargs={
"device_map": "auto",
"torch_dtype": "float16",
"use_cache": False
},
formatting_func=sample_formatting_function,
compute_metrics=sample_compute_metrics,
generation_config={
"max_new_tokens": 128,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"repetition_penalty": 1.1,
"pad_token_id": 50256,
}
)
])Define the Model Factory Function
RapidFire AI uses a factory function to create model instances on-demand. Instead of loading all 4 models into memory at once (which would likely cause out-of-memory errors), RapidFire calls this function each time it needs a model during chunk-based scheduling. The function takes a configuration dictionary and returns a (model, tokenizer) tuple.
def sample_create_model(model_config):
"""Function to create model object with GPT-2 adjustments"""
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = model_config["model_name"]
model_type = model_config["model_type"]
model_kwargs = model_config["model_kwargs"]
if model_type == "causal_lm":
model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
else:
# Default to causal LM
model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# GPT-2 specific: Set pad token (GPT-2 doesn't have one by default)
if "gpt2" in model_name.lower():
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left" # GPT-2 works better with left padding
model.config.pad_token_id = model.config.eos_token_id
return (model, tokenizer)# Simple grid search across all config combinations: 4 total (2 LoRA configs × 2 trainer configs)
config_group = RFGridSearch(
configs=config_set_lite,
trainer_type="SFT"
)Monitor Training Loss and Evaluation Metrics
We’ll use TensorBoard to visualize training progress across all 4 configurations. TensorBoard provides interactive plots for loss curves, learning rates, and evaluation metrics—making it easy to compare which hyperparameter combinations perform best.
Run the cell below before starting training to see metrics update in real-time.
%tensorboard --logdir {tensorboard_log_dir}Run Training and Evaluation
We’ll now train all 4 configurations concurrently and evaluate them on the validation set. RapidFire AI handles the scheduling, rotating between configurations after each data chunk so you get comparative metrics early.
The experiment.run_fit() function orchestrates this process:
config_group— The grid of configurations to train (our 4 combinations)sample_create_model— Factory function that creates model/tokenizer instancestrain_dataset/eval_dataset— Training and evaluation datanum_chunks— Number of data chunks for interleaved scheduling (higher = more frequent rotation between configs)seed— Random seed for reproducibility
# Launch train and validation for all configs in the config_group with swap granularity of 4 chunks for hyperparallel execution
experiment.run_fit(
config_group,
sample_create_model,
train_dataset,
eval_dataset,
num_chunks=4,
seed=42
)Launch Interactive Run Controller
RapidFire AI provides an Interactive Controller that lets you manage executing runs dynamically in real-time from the notebook:
- ⏹️ Stop: Gracefully stop a running config
- ▶️ Resume: Resume a stopped run
- 🗑️ Delete: Remove a run from this experiment
- 📋 Clone: Create a new run by editing the config dictionary of a parent run to try new knob values; optional warm start of parameters
- 🔄 Refresh: Update run status and metrics
The Controller uses ipywidgets and is compatible with both Colab (ipywidgets 7.x) and Jupyter (ipywidgets 8.x).
# Create Interactive Controller
sleep(15)
from rapidfireai.fit.utils.interactive_controller import InteractiveController
controller = InteractiveController(dispatcher_url="http://127.0.0.1:8851")
controller.display()End Experiment
from google.colab import output
from IPython.display import display, HTML
display(HTML('''
<button id="continue-btn" style="padding: 10px 20px; font-size: 16px;">Click to End Experiment</button>
'''))
# eval_js blocks until the Promise resolves
output.eval_js('''
new Promise((resolve) => {
document.getElementById("continue-btn").onclick = () => {
document.getElementById("continue-btn").disabled = true;
document.getElementById("continue-btn").innerText = "Continuing...";
resolve("clicked");
};
})
''')
# Actually end the experiment after the button is clicked
experiment.end()
print("Done!")View TensorBoard Plots and Logs
After your experiment is ended, you can still view the full logs in TensorBoard:
# View final logs
%tensorboard --logdir {tensorboard_log_dir}View RapidFire AI Log Files
You can track the work being done by the system via the RapidFire AI-produced log files in logs/experiments/ folder.
# Get the experiment-specific log file
from IPython.display import display, Pretty
log_file = experiment.get_log_file_path()
display(Pretty(f"📄 Experiment Log File: {log_file}"))
if log_file.exists():
display(Pretty("=" * 80))
display(Pretty(f"Last 30 lines of {log_file.name}:"))
display(Pretty("=" * 80))
with open(log_file, 'r', encoding='utf-8') as f:
lines = f.readlines()
for line in lines[-30:]:
display(Pretty(line.rstrip()))
else:
display(Pretty(f"❌ Log file not found: {log_file}"))# Get the training-specific log file
log_file = experiment.get_log_file_path("training")
display(Pretty(f"📄 Training Log File: {log_file}"))
if log_file.exists():
display(Pretty("=" * 80))
display(Pretty(f"Last 30 lines of {log_file.name}:"))
display(Pretty("=" * 80))
with open(log_file, 'r', encoding='utf-8') as f:
lines = f.readlines()
for line in lines[-30:]:
display(Pretty(line.rstrip()))
else:
display(Pretty(f"❌ Log file not found: {log_file}"))Conclusion and Next Steps
In this tutorial, you trained 4 LoRA configurations concurrently on a customer support dataset using RapidFire AI’s chunk-based scheduling. Instead of running experiments sequentially, you got comparative metrics early—allowing you to identify promising hyperparameters faster.
Interpreting Your Results:
- Check TensorBoard for loss curves and evaluation metrics across all 4 runs
- The configuration with the lowest validation loss and highest ROUGE-L score is likely your best performer
- Use the Interactive Controller to stop underperforming runs early and save GPU time
Next Steps:
- Save the best adapter: Export the LoRA weights from your top-performing configuration
- Scale up training: Increase
max_stepsand dataset size for production-quality fine-tuning - Try larger models: Swap GPT-2 for Llama, Mistral, or other models supported by TRL
- Explore more hyperparameters: Add additional learning rates, LoRA ranks, or schedulers to your grid
Learn More:
- 📖 RapidFire AI Documentation
- 💬 Join our Discord for help and discussions
- ⭐ Star us on GitHub