|
|
--- |
|
|
license: mit |
|
|
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
|
tags: |
|
|
- peft |
|
|
- lora |
|
|
- qlora |
|
|
- tinyllama |
|
|
- text-generation |
|
|
- quotes |
|
|
- generated_from_trainer |
|
|
- trl |
|
|
library_name: peft |
|
|
--- |
|
|
|
|
|
# LoRA Adapters: TinyLlama-1.1B Quote Generator |
|
|
|
|
|
This repository contains the LoRA (Low-Rank Adaptation) adapter weights for a version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0` fine-tuned to generate motivational quotes. |
|
|
|
|
|
**These are only the adapter weights, not the full model.** You must load these adapters onto the base TinyLlama model to use them. |
|
|
|
|
|
This model was trained in Google Colab on a T4 GPU using QLoRA. The training process specialized the model, resulting in a **2.4x inference speedup** on the same GPU compared to the base model. |
|
|
|
|
|
- **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) |
|
|
- **Dataset:** [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes) |
|
|
|
|
|
## ⚡ Quick Start (How to use) |
|
|
|
|
|
This shows how to load the 4-bit quantized base model and merge these adapters for fast inference on a GPU. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig |
|
|
from peft import PeftModel |
|
|
|
|
|
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" |
|
|
adapter_repo_name = "bkqz/tinyllama-quotes-adapters" # This repo |
|
|
|
|
|
# 1. Load the 4-bit base model |
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.float16, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
) |
|
|
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model_name, |
|
|
quantization_config=bnb_config, |
|
|
device_map="auto", |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
# 2. Load the tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True) |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
tokenizer.padding_side = "right" |
|
|
|
|
|
# 3. Load the LoRA adapters from this repo |
|
|
finetuned_model = PeftModel.from_pretrained(base_model, adapter_repo_name) |
|
|
print("Base model and LoRA adapters loaded.") |
|
|
|
|
|
# 4. Cast adapters to float16 to fix data type mismatch |
|
|
finetuned_model.to(torch.float16) |
|
|
|
|
|
# 5. Set up the generation pipeline |
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model=finetuned_model, |
|
|
tokenizer=tokenizer, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# 6. Generate a quote |
|
|
prompt = "Keyword: life\nQuote:" |
|
|
|
|
|
result = pipe( |
|
|
prompt, |
|
|
max_new_tokens=80, |
|
|
do_sample=True, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
eos_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
print(result[0]['generated_text']) |
|
|
``` |
|
|
|
|
|
## 💬 Prompt Format |
|
|
|
|
|
This model was trained on a very specific format. For best results, your prompt **must** end with `\nQuote:`. |
|
|
``` |
|
|
Keyword: [YOUR_KEYWORD]\nQuote: |
|
|
``` |
|
|
|
|
|
The model will generate a single quote and append ` - Unknown`. |
|
|
|
|
|
## 🛠️ Training Procedure |
|
|
|
|
|
This model was fine-tuned using `trl.SFTTrainer` with QLoRA. |
|
|
|
|
|
* **Dataset:** The `Abirate/english_quotes` dataset was "exploded" so that each `(quote, tag)` pair became a unique training example. |
|
|
* **Format:** The training text was formatted as `Keyword: [tag]\nQuote: [quote] - Unknown`. This was done to overwrite the base model's habit of adding real authors. |
|
|
* **Evaluation:** The model was trained with an 10% evaluation split and `early_stopping_patience=3` to prevent overfitting. |
|
|
|
|
|
### Framework Versions |
|
|
* TRL: 0.25.0 |
|
|
* Transformers: 4.57.1 |
|
|
* Pytorch: 2.8.0+cu126 |
|
|
* Datasets: 4.0.0 |
|
|
* Tokenizers: 0.22.1 |
|
|
|