bkqz's picture
Update README.md
b2eff20 verified
---
license: mit
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
- peft
- lora
- qlora
- tinyllama
- text-generation
- quotes
- generated_from_trainer
- trl
library_name: peft
---
# LoRA Adapters: TinyLlama-1.1B Quote Generator
This repository contains the LoRA (Low-Rank Adaptation) adapter weights for a version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0` fine-tuned to generate motivational quotes.
**These are only the adapter weights, not the full model.** You must load these adapters onto the base TinyLlama model to use them.
This model was trained in Google Colab on a T4 GPU using QLoRA. The training process specialized the model, resulting in a **2.4x inference speedup** on the same GPU compared to the base model.
- **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- **Dataset:** [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes)
## ⚡ Quick Start (How to use)
This shows how to load the 4-bit quantized base model and merge these adapters for fast inference on a GPU.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from peft import PeftModel
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_repo_name = "bkqz/tinyllama-quotes-adapters" # This repo
# 1. Load the 4-bit base model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
# 2. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# 3. Load the LoRA adapters from this repo
finetuned_model = PeftModel.from_pretrained(base_model, adapter_repo_name)
print("Base model and LoRA adapters loaded.")
# 4. Cast adapters to float16 to fix data type mismatch
finetuned_model.to(torch.float16)
# 5. Set up the generation pipeline
pipe = pipeline(
"text-generation",
model=finetuned_model,
tokenizer=tokenizer,
device_map="auto"
)
# 6. Generate a quote
prompt = "Keyword: life\nQuote:"
result = pipe(
prompt,
max_new_tokens=80,
do_sample=True,
temperature=0.7,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id
)
print(result[0]['generated_text'])
```
## 💬 Prompt Format
This model was trained on a very specific format. For best results, your prompt **must** end with `\nQuote:`.
```
Keyword: [YOUR_KEYWORD]\nQuote:
```
The model will generate a single quote and append ` - Unknown`.
## 🛠️ Training Procedure
This model was fine-tuned using `trl.SFTTrainer` with QLoRA.
* **Dataset:** The `Abirate/english_quotes` dataset was "exploded" so that each `(quote, tag)` pair became a unique training example.
* **Format:** The training text was formatted as `Keyword: [tag]\nQuote: [quote] - Unknown`. This was done to overwrite the base model's habit of adding real authors.
* **Evaluation:** The model was trained with an 10% evaluation split and `early_stopping_patience=3` to prevent overfitting.
### Framework Versions
* TRL: 0.25.0
* Transformers: 4.57.1
* Pytorch: 2.8.0+cu126
* Datasets: 4.0.0
* Tokenizers: 0.22.1