File size: 2,145 Bytes

3e711db
 
a00a6ec
3e711db
 
 
 
a00a6ec
3e711db
6f8c2d7
3e711db
a00a6ec
3e711db
 
 
a00a6ec
 
 
 
 
 
40e3374
3e711db
a00a6ec
3e711db
a00a6ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e711db

---
library_name: transformers
tags: [llama2, peft, character-chatbot, gradio, 4bit]
---

# Model Card for Model ID

# LLM Character-Based Chatbot (LoRA Fine-Tuned)

This model fine-tunes Meta's `LLaMA-2-7b-chat-hf` using PEFT and LoRA to create a **character-based chatbot** that mimics the style and personality of a fictional character. It has been trained on question-answering dataset structured in a conversational format.

---

## Model Details

- **Base Model:** `meta-llama/Llama-2-7b-chat-hf`
- **Fine-Tuned Using:** LoRA via PEFT
- **Quantization:** 4-bit (using bitsandbytes)
- **Language:** English
- **Tokenizer:** Same as base model
- **Intended Use:** Educational and personal projects
- **License:** This model is fine-tuned from Meta’s LLaMA-2-7b-chat-hf, which is released under the LLaMA 2 Community License. This fine-tuned version is intended for non-commercial, educational use only.

---

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base + LoRA fine-tuned model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-chat-hf",
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_4bit=True
)

model = PeftModel.from_pretrained(base_model, "IrfanHamid/ChatBot-lora-7b")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Generate response
messages = [
    {"role": "system", "content": "You are Spider-Man from the Marvel universe. Speak like Peter Parker — witty, responsible, and full of heart. Always respond in character."},
    {"role": "user", "content": "What's your biggest fear?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        top_p=0.9,
        temperature=0.8,
        pad_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True).strip())