---
license: mit
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
- peft
- lora
- qlora
- tinyllama
- text-generation
- quotes
- generated_from_trainer
- trl
library_name: peft
---

# LoRA Adapters: TinyLlama-1.1B Quote Generator

This repository contains the LoRA (Low-Rank Adaptation) adapter weights for a version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0` fine-tuned to generate motivational quotes.

**These are only the adapter weights, not the full model.** You must load these adapters onto the base TinyLlama model to use them.

This model was trained in Google Colab on a T4 GPU using QLoRA. The training process specialized the model, resulting in a **2.4x inference speedup** on the same GPU compared to the base model.

- **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- **Dataset:** [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes)

## ⚡ Quick Start (How to use)

This shows how to load the 4-bit quantized base model and merge these adapters for fast inference on a GPU.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from peft import PeftModel

base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_repo_name = "bkqz/tinyllama-quotes-adapters" # This repo

# 1. Load the 4-bit base model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# 2. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# 3. Load the LoRA adapters from this repo
finetuned_model = PeftModel.from_pretrained(base_model, adapter_repo_name)
print("Base model and LoRA adapters loaded.")

# 4. Cast adapters to float16 to fix data type mismatch
finetuned_model.to(torch.float16)

# 5. Set up the generation pipeline
pipe = pipeline(
    "text-generation",
    model=finetuned_model,
    tokenizer=tokenizer,
    device_map="auto"
)

# 6. Generate a quote
prompt = "Keyword: life\nQuote:"

result = pipe(
    prompt,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id
)

print(result[0]['generated_text'])
```

## 💬 Prompt Format

This model was trained on a very specific format. For best results, your prompt **must** end with `\nQuote:`.
```
Keyword: [YOUR_KEYWORD]\nQuote:
```

The model will generate a single quote and append ` - Unknown`.

## 🛠️ Training Procedure

This model was fine-tuned using `trl.SFTTrainer` with QLoRA.

* **Dataset:** The `Abirate/english_quotes` dataset was "exploded" so that each `(quote, tag)` pair became a unique training example.
* **Format:** The training text was formatted as `Keyword: [tag]\nQuote: [quote] - Unknown`. This was done to overwrite the base model's habit of adding real authors.
* **Evaluation:** The model was trained with an 10% evaluation split and `early_stopping_patience=3` to prevent overfitting.

### Framework Versions
* TRL: 0.25.0
* Transformers: 4.57.1
* Pytorch: 2.8.0+cu126
* Datasets: 4.0.0
* Tokenizers: 0.22.1