Instructions to use harshagale/llm-upload with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use harshagale/llm-upload with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-chat-hf") model = PeftModel.from_pretrained(base_model, "harshagale/llm-upload") - Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
base_model: NousResearch/Llama-2-7b-chat-hf
tags:
- loRA
- qloRA
- peft
- causal-lm
- text-generation
- fine-tuned
datasets:
- mlabonne/guanaco-llama2-1k
pipeline_tag: text-generation
language:
- en
Llama-2-7b-chat-hf Fine-Tuned with QLoRA
This model is a fine-tuned version of NousResearch/Llama-2-7b-chat-hf using Parameter-Efficient Fine-Tuning (PEFT) via QLoRA (4-bit quantization). It was trained on the mlabonne/guanaco-llama2-1k dataset.
Note: This repository contains only the adapter weights. To use this model, you need to load the base model (
NousResearch/Llama-2-7b-chat-hf) and apply these LoRA adapters on top of it.
Model Details
- Developed by: Harsh Agale
- Base Model:
NousResearch/Llama-2-7b-chat-hf - Method: QLoRA (4-bit Quantization + LoRA)
- Language(s): English
- License: Apache 2.0
- Task: Causal Language Modeling / Text Generation
Training Hyperparameters
The model was trained using the following configuration:
- Quantization: 4-bit NormalFloat (
nf4) with double quantization - Compute Dtype:
float16 - LoRA Rank (r): 8
- LoRA Alpha: 16
- Target Modules:
q_proj,v_proj - LoRA Dropout: 0.05
- Learning Rate: 2e-4
- Optimizer:
paged_adamw_8bit - Batch Size: 1 (with 4 Gradient Accumulation Steps)
- Epochs: 1
Project Purpose
This project was created to learn and experiment with:
- QLoRA fine-tuning
- PEFT adapters
- 4-bit quantization
- Efficient LLM training
- Hugging Face ecosystem
Limitations
- Trained on a small dataset
- May produce hallucinated responses
- Intended for educational and research purposes
How to Load and Use This Model
You can easily load this model and its adapters using the transformers and peft libraries:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
model_id = "NousResearch/Llama-2-7b-chat-hf"
adapter_id = "harshagale/llm-upload"
# 1. You must use the same 4-bit config to load the base model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
# 2. Load the base tokenizer and configure the padding token
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# 3. Load the quantized base model
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
# 4. Merge the PEFT adapter weights onto the base model
model = PeftModel.from_pretrained(base_model, adapter_id)
# 5. Quick inference test
prompt = "Human: Tell me a joke.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))