Text Generation
PEFT
Safetensors
English
gemma4
unsloth
lora
qlora
fine-tuning
hackathon
gemma-4-good-hackathon
kaggle
conversational
Instructions to use bradduy/Any2AnyModels with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use bradduy/Any2AnyModels with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-E4B-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "bradduy/Any2AnyModels") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use bradduy/Any2AnyModels with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bradduy/Any2AnyModels to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bradduy/Any2AnyModels to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bradduy/Any2AnyModels to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="bradduy/Any2AnyModels", max_seq_length=2048, )
File size: 7,990 Bytes
80f9004 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | ---
language:
- en
license: apache-2.0
library_name: peft
base_model: google/gemma-4-e4b-it
tags:
- gemma4
- unsloth
- lora
- qlora
- fine-tuning
- hackathon
- gemma-4-good-hackathon
- kaggle
datasets:
- mlabonne/FineTome-100k
pipeline_tag: text-generation
---
# Gemma 4 E4B Fine-Tuned with Unsloth QLoRA
**Competition:** [The Gemma 4 Good Hackathon](https://www.kaggle.com/competitions/gemma-4-good-hackathon) on Kaggle
**Tracks:** Unsloth ($10K prize) + Impact Tracks
**Framework:** [Unsloth](https://unsloth.ai) — 2x faster fine-tuning
**Base Model:** [google/gemma-4-e4b-it](https://huggingface.co/google/gemma-4-e4b-it) (4B params, instruction-tuned)
## Highlights
- **99.6% training loss reduction** — from 2.916 (baseline) to **0.0115** (final)
- **5 epochs** of QLoRA fine-tuning on 10,000 high-quality samples
- **Only 2.29% of parameters trained** (146.8M / 6.4B) via rank-stabilized LoRA
- **12 hours total training** on a single NVIDIA L4 GPU (24GB)
## How to Use
### With Unsloth (Recommended)
```python
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
"bradduy/Any2AnyModels",
max_seq_length=2048,
load_in_4bit=True,
)
FastModel.for_inference(model)
messages = [
{"role": "user", "content": "Explain how renewable energy helps developing communities"}
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
outputs = model.generate(
input_ids=inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
```
### With Transformers + PEFT
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-e4b-it",
device_map="auto",
load_in_4bit=True,
)
model = PeftModel.from_pretrained(base_model, "bradduy/Any2AnyModels")
tokenizer = AutoTokenizer.from_pretrained("bradduy/Any2AnyModels")
```
## Training Details
### Method
We used **Unsloth's QLoRA** implementation with **rank-stabilized LoRA (RSLoRA)** for parameter-efficient fine-tuning. The key innovation was discovering that **multi-epoch training dramatically reduces loss** with each additional pass over the data.
### Configuration
| Parameter | Value |
|-----------|-------|
| Base Model | `google/gemma-4-e4b-it` (4B params) |
| Quantization | 4-bit QLoRA via bitsandbytes |
| LoRA Rank | 64 |
| LoRA Alpha | 64 |
| RSLoRA | Enabled (rank-stabilized scaling) |
| Learning Rate | 7e-5 |
| LR Scheduler | Cosine |
| Epochs | 5 |
| Dataset Size | 10,000 samples |
| Effective Batch Size | 8 (1 × 8 grad accumulation) |
| Weight Decay | 0.01 |
| Warmup Steps | 50 |
| Total Steps | 6,250 |
| Max Seq Length | 2048 |
| Optimizer | AdamW 8-bit |
| Seed | 3407 |
| Response Masking | `train_on_responses_only` enabled |
### Dataset
- **Source:** [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k)
- **Samples Used:** 10,000 (first 10k)
- **Format:** Multi-turn chat conversations
- **Chat Template:** Gemma 4 native (`role: "model"`, not `"assistant"`)
- **Masking:** Only model responses contribute to loss (instruction tokens masked)
### Hardware
- **GPU:** NVIDIA L4 (24GB VRAM)
- **RAM:** 32GB
- **Training Time:** ~12 hours (with checkpoint resume)
- **GPU Memory Used:** ~14.8GB during training
## Experiment Journey
We ran **8 systematic experiments** to find the optimal configuration:
| Exp | LoRA r | Epochs | Samples | LR | Train Loss | Key Finding |
|-----|--------|--------|---------|-----|-----------|-------------|
| 01 | 16 | 0.13 | 3k | 2e-4 | 2.916 | Baseline |
| 02 | 32 | 0.24 | 5k | 2e-4 | 1.725 | Higher rank helps (+41%) |
| 03 | 64+RSLoRA | 0.20 | 10k | 2e-4 | 1.460 | RSLoRA + more data (+50%) |
| 04 | 64+RSLoRA | 0.40 | 20k | 1e-4 | ~1.05 | Lower LR improves convergence |
| 05 | 128+RSLoRA | 0.40 | 20k | 5e-5 | 1.134 | r=128 slower than r=64 |
| 06 | 64+RSLoRA | 3 | 10k | 1e-4 | ~0.30 | **Multi-epoch is transformative** |
| 07 | 128+RSLoRA | 3 | 10k | 1e-4 | ~0.59 | r=64 > r=128 for multi-epoch |
| **08** | **64+RSLoRA** | **5** | **10k** | **7e-5** | **0.0115** | **5 epochs = 99.6% reduction** |
### The Multi-Epoch Discovery
The single most impactful finding: **each additional epoch delivers a dramatic, consistent loss reduction:**
```
Epoch 1: loss ~0.90 (learning the patterns)
Epoch 2: loss ~0.60 (reinforcing knowledge)
Epoch 3: loss ~0.30 (deep memorization)
Epoch 4: loss ~0.10 (fine polishing)
Epoch 5: loss ~0.01 (near-perfect fitting)
```
This pattern was consistent across experiments 06, 07, and 08. The loss drops happen at each epoch boundary as the model sees the training data again.
### Other Key Insights
1. **r=64 with RSLoRA is the sweet spot** — r=128 converges slower and provides no benefit in multi-epoch settings
2. **Lower LR (7e-5) stabilizes long training** — higher LR (2e-4) causes instability after epoch 2
3. **`train_on_responses_only` is essential** — masks user/system tokens so the model only learns from responses
4. **Checkpoint saving every 250 steps** — long CUDA runs crash from memory fragmentation; resume from checkpoints solved this
5. **10k high-quality samples > 20k samples** for multi-epoch — quality over quantity when doing multiple passes
## Training Pipeline
Built entirely with [Unsloth](https://unsloth.ai):
```python
from unsloth import FastModel
from trl import SFTTrainer, SFTConfig
from unsloth.chat_templates import get_chat_template, train_on_responses_only
# 1. Load 4-bit quantized model
model, tokenizer = FastModel.from_pretrained(
"unsloth/gemma-4-E4B-it-unsloth-bnb-4bit",
max_seq_length=2048, load_in_4bit=True,
)
# 2. Apply LoRA adapters (r=64, RSLoRA)
model = FastModel.get_peft_model(model,
finetune_vision_layers=False, finetune_language_layers=True,
finetune_attention_modules=True, finetune_mlp_modules=True,
r=64, lora_alpha=64, lora_dropout=0, bias="none",
random_state=3407, use_rslora=True,
)
# 3. Setup Gemma 4 chat template
tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")
# 4. Train with response-only masking
trainer = SFTTrainer(model=model, tokenizer=tokenizer, train_dataset=dataset,
args=SFTConfig(
per_device_train_batch_size=1, gradient_accumulation_steps=8,
learning_rate=7e-5, num_train_epochs=5, lr_scheduler_type="cosine",
warmup_steps=50, weight_decay=0.01, optim="adamw_8bit",
save_strategy="steps", save_steps=250, save_total_limit=3,
),
)
trainer = train_on_responses_only(trainer,
instruction_part="<|turn>user\n", response_part="<|turn>model\n",
)
trainer.train()
```
## Reproduce Training
```bash
git clone https://github.com/bradduy/Any2AnyModels
cd Any2AnyModels
pip install unsloth
python scripts/train.py \
--model unsloth/gemma-4-E4B-it-unsloth-bnb-4bit \
--load-4bit --lora-rank 64 --use-rslora \
--dataset mlabonne/FineTome-100k --max-samples 10000 \
--num-epochs 5 --learning-rate 7e-5 --grad-accum 8 \
--weight-decay 0.01 --warmup-steps 50 --scheduler cosine \
--save-steps 250 --save-total-limit 3
```
## Limitations
- Fine-tuned on English-only data (FineTome-100k)
- Optimized for instruction following, not domain-specific tasks
- 4B parameter model — larger models (26B, 31B) would perform better but require more VRAM
- Training loss ≠downstream task performance; the model should be evaluated on specific benchmarks
## Acknowledgments
- **Google DeepMind** for the [Gemma 4](https://blog.google/technology/developers/gemma-4/) model family
- **[Unsloth](https://unsloth.ai)** for making QLoRA fine-tuning 2x faster and memory efficient
- **[Kaggle](https://www.kaggle.com)** for hosting the Gemma 4 Good Hackathon
- **[mlabonne](https://huggingface.co/mlabonne)** for the FineTome-100k dataset
## License
Apache 2.0 (same as Gemma 4)
|