llm-upload / README.md
harshagale's picture
Update README.md
af4a212 verified
---
license: apache-2.0
base_model: NousResearch/Llama-2-7b-chat-hf
tags:
- loRA
- qloRA
- peft
- causal-lm
- text-generation
- fine-tuned
datasets:
- mlabonne/guanaco-llama2-1k
pipeline_tag: text-generation
language:
- en
---
# Llama-2-7b-chat-hf Fine-Tuned with QLoRA
This model is a fine-tuned version of `NousResearch/Llama-2-7b-chat-hf` using Parameter-Efficient Fine-Tuning (PEFT) via **QLoRA** (4-bit quantization). It was trained on the `mlabonne/guanaco-llama2-1k` dataset.
> **Note:** This repository contains **only the adapter weights**. To use this model, you need to load the base model (`NousResearch/Llama-2-7b-chat-hf`) and apply these LoRA adapters on top of it.
## Model Details
- **Developed by:** Harsh Agale
- **Base Model:** `NousResearch/Llama-2-7b-chat-hf`
- **Method:** QLoRA (4-bit Quantization + LoRA)
- **Language(s):** English
- **License:** Apache 2.0
- **Task:** Causal Language Modeling / Text Generation
## Training Hyperparameters
The model was trained using the following configuration:
* **Quantization:** 4-bit NormalFloat (`nf4`) with double quantization
* **Compute Dtype:** `float16`
* **LoRA Rank (r):** 8
* **LoRA Alpha:** 16
* **Target Modules:** `q_proj`, `v_proj`
* **LoRA Dropout:** 0.05
* **Learning Rate:** 2e-4
* **Optimizer:** `paged_adamw_8bit`
* **Batch Size:** 1 (with 4 Gradient Accumulation Steps)
* **Epochs:** 1
## Project Purpose
This project was created to learn and experiment with:
- QLoRA fine-tuning
- PEFT adapters
- 4-bit quantization
- Efficient LLM training
- Hugging Face ecosystem
## Limitations
- Trained on a small dataset
- May produce hallucinated responses
- Intended for educational and research purposes
## How to Load and Use This Model
You can easily load this model and its adapters using the `transformers` and `peft` libraries:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
model_id = "NousResearch/Llama-2-7b-chat-hf"
adapter_id = "harshagale/llm-upload"
# 1. You must use the same 4-bit config to load the base model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
# 2. Load the base tokenizer and configure the padding token
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# 3. Load the quantized base model
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
# 4. Merge the PEFT adapter weights onto the base model
model = PeftModel.from_pretrained(base_model, adapter_id)
# 5. Quick inference test
prompt = "Human: Tell me a joke.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))