Llama-2-7b-chat-hf Fine-Tuned with QLoRA

This model is a fine-tuned version of NousResearch/Llama-2-7b-chat-hf using Parameter-Efficient Fine-Tuning (PEFT) via QLoRA (4-bit quantization). It was trained on the mlabonne/guanaco-llama2-1k dataset.

Note: This repository contains only the adapter weights. To use this model, you need to load the base model (NousResearch/Llama-2-7b-chat-hf) and apply these LoRA adapters on top of it.

Model Details

  • Developed by: Harsh Agale
  • Base Model: NousResearch/Llama-2-7b-chat-hf
  • Method: QLoRA (4-bit Quantization + LoRA)
  • Language(s): English
  • License: Apache 2.0
  • Task: Causal Language Modeling / Text Generation

Training Hyperparameters

The model was trained using the following configuration:

  • Quantization: 4-bit NormalFloat (nf4) with double quantization
  • Compute Dtype: float16
  • LoRA Rank (r): 8
  • LoRA Alpha: 16
  • Target Modules: q_proj, v_proj
  • LoRA Dropout: 0.05
  • Learning Rate: 2e-4
  • Optimizer: paged_adamw_8bit
  • Batch Size: 1 (with 4 Gradient Accumulation Steps)
  • Epochs: 1

Project Purpose

This project was created to learn and experiment with:

  • QLoRA fine-tuning
  • PEFT adapters
  • 4-bit quantization
  • Efficient LLM training
  • Hugging Face ecosystem

Limitations

  • Trained on a small dataset
  • May produce hallucinated responses
  • Intended for educational and research purposes

How to Load and Use This Model

You can easily load this model and its adapters using the transformers and peft libraries:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

model_id = "NousResearch/Llama-2-7b-chat-hf"
adapter_id = "harshagale/llm-upload"

# 1. You must use the same 4-bit config to load the base model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

# 2. Load the base tokenizer and configure the padding token
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

# 3. Load the quantized base model
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

# 4. Merge the PEFT adapter weights onto the base model
model = PeftModel.from_pretrained(base_model, adapter_id)

# 5. Quick inference test
prompt = "Human: Tell me a joke.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
68
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for harshagale/llm-upload

Adapter
(464)
this model

Dataset used to train harshagale/llm-upload