bkqz
/

tinyllama-quotes-adapters

Text Generation

Generated from Trainer

Model card Files Files and versions

tinyllama-quotes-adapters / README.md

bkqz's picture

Update README.md

b2eff20 verified about 2 months ago

|

history blame contribute delete

3.43 kB

	---
	license: mit
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	tags:
	- peft
	- lora
	- qlora
	- tinyllama
	- text-generation
	- quotes
	- generated_from_trainer
	- trl
	library_name: peft
	---

	# LoRA Adapters: TinyLlama-1.1B Quote Generator

	This repository contains the LoRA (Low-Rank Adaptation) adapter weights for a version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0` fine-tuned to generate motivational quotes.

	These are only the adapter weights, not the full model. You must load these adapters onto the base TinyLlama model to use them.

	This model was trained in Google Colab on a T4 GPU using QLoRA. The training process specialized the model, resulting in a 2.4x inference speedup on the same GPU compared to the base model.

	- Base Model: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
	- Dataset: [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes)

	## ⚡ Quick Start (How to use)

	This shows how to load the 4-bit quantized base model and merge these adapters for fast inference on a GPU.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
	from peft import PeftModel

	base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
	adapter_repo_name = "bkqz/tinyllama-quotes-adapters" # This repo

	# 1. Load the 4-bit base model
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True,
	)

	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	quantization_config=bnb_config,
	device_map="auto",
	trust_remote_code=True,
	)

	# 2. Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = "right"

	# 3. Load the LoRA adapters from this repo
	finetuned_model = PeftModel.from_pretrained(base_model, adapter_repo_name)
	print("Base model and LoRA adapters loaded.")

	# 4. Cast adapters to float16 to fix data type mismatch
	finetuned_model.to(torch.float16)

	# 5. Set up the generation pipeline
	pipe = pipeline(
	"text-generation",
	model=finetuned_model,
	tokenizer=tokenizer,
	device_map="auto"
	)

	# 6. Generate a quote
	prompt = "Keyword: life\nQuote:"

	result = pipe(
	prompt,
	max_new_tokens=80,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	eos_token_id=tokenizer.eos_token_id
	)

	print(result[0]['generated_text'])
	```

	## 💬 Prompt Format

	This model was trained on a very specific format. For best results, your prompt must end with `\nQuote:`.
	```
	Keyword: [YOUR_KEYWORD]\nQuote:
	```

	The model will generate a single quote and append ` - Unknown`.

	## 🛠️ Training Procedure

	This model was fine-tuned using `trl.SFTTrainer` with QLoRA.

	* Dataset: The `Abirate/english_quotes` dataset was "exploded" so that each `(quote, tag)` pair became a unique training example.
	* Format: The training text was formatted as `Keyword: [tag]\nQuote: [quote] - Unknown`. This was done to overwrite the base model's habit of adding real authors.
	* Evaluation: The model was trained with an 10% evaluation split and `early_stopping_patience=3` to prevent overfitting.

	### Framework Versions
	* TRL: 0.25.0
	* Transformers: 4.57.1
	* Pytorch: 2.8.0+cu126
	* Datasets: 4.0.0
	* Tokenizers: 0.22.1