--- license: mit base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 tags: - peft - lora - qlora - tinyllama - text-generation - quotes - generated_from_trainer - trl library_name: peft --- # LoRA Adapters: TinyLlama-1.1B Quote Generator This repository contains the LoRA (Low-Rank Adaptation) adapter weights for a version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0` fine-tuned to generate motivational quotes. **These are only the adapter weights, not the full model.** You must load these adapters onto the base TinyLlama model to use them. This model was trained in Google Colab on a T4 GPU using QLoRA. The training process specialized the model, resulting in a **2.4x inference speedup** on the same GPU compared to the base model. - **Base Model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) - **Dataset:** [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes) ## ⚡ Quick Start (How to use) This shows how to load the 4-bit quantized base model and merge these adapters for fast inference on a GPU. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig from peft import PeftModel base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" adapter_repo_name = "bkqz/tinyllama-quotes-adapters" # This repo # 1. Load the 4-bit base model bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) base_model = AutoModelForCausalLM.from_pretrained( base_model_name, quantization_config=bnb_config, device_map="auto", trust_remote_code=True, ) # 2. Load the tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" # 3. Load the LoRA adapters from this repo finetuned_model = PeftModel.from_pretrained(base_model, adapter_repo_name) print("Base model and LoRA adapters loaded.") # 4. Cast adapters to float16 to fix data type mismatch finetuned_model.to(torch.float16) # 5. Set up the generation pipeline pipe = pipeline( "text-generation", model=finetuned_model, tokenizer=tokenizer, device_map="auto" ) # 6. Generate a quote prompt = "Keyword: life\nQuote:" result = pipe( prompt, max_new_tokens=80, do_sample=True, temperature=0.7, top_p=0.9, eos_token_id=tokenizer.eos_token_id ) print(result[0]['generated_text']) ``` ## 💬 Prompt Format This model was trained on a very specific format. For best results, your prompt **must** end with `\nQuote:`. ``` Keyword: [YOUR_KEYWORD]\nQuote: ``` The model will generate a single quote and append ` - Unknown`. ## 🛠️ Training Procedure This model was fine-tuned using `trl.SFTTrainer` with QLoRA. * **Dataset:** The `Abirate/english_quotes` dataset was "exploded" so that each `(quote, tag)` pair became a unique training example. * **Format:** The training text was formatted as `Keyword: [tag]\nQuote: [quote] - Unknown`. This was done to overwrite the base model's habit of adding real authors. * **Evaluation:** The model was trained with an 10% evaluation split and `early_stopping_patience=3` to prevent overfitting. ### Framework Versions * TRL: 0.25.0 * Transformers: 4.57.1 * Pytorch: 2.8.0+cu126 * Datasets: 4.0.0 * Tokenizers: 0.22.1