|
|
--- |
|
|
library_name: transformers |
|
|
tags: [llama2, peft, character-chatbot, gradio, 4bit] |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
# LLM Character-Based Chatbot (LoRA Fine-Tuned) |
|
|
|
|
|
This model fine-tunes Meta's `LLaMA-2-7b-chat-hf` using PEFT and LoRA to create a **character-based chatbot** that mimics the style and personality of a fictional character. It has been trained on question-answering dataset structured in a conversational format. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model:** `meta-llama/Llama-2-7b-chat-hf` |
|
|
- **Fine-Tuned Using:** LoRA via PEFT |
|
|
- **Quantization:** 4-bit (using bitsandbytes) |
|
|
- **Language:** English |
|
|
- **Tokenizer:** Same as base model |
|
|
- **Intended Use:** Educational and personal projects |
|
|
- **License:** This model is fine-tuned from Meta’s LLaMA-2-7b-chat-hf, which is released under the LLaMA 2 Community License. This fine-tuned version is intended for non-commercial, educational use only. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
# Load base + LoRA fine-tuned model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"meta-llama/Llama-2-7b-chat-hf", |
|
|
device_map="auto", |
|
|
torch_dtype=torch.float16, |
|
|
load_in_4bit=True |
|
|
) |
|
|
|
|
|
model = PeftModel.from_pretrained(base_model, "IrfanHamid/ChatBot-lora-7b") |
|
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf") |
|
|
|
|
|
# Generate response |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are Spider-Man from the Marvel universe. Speak like Peter Parker — witty, responsible, and full of heart. Always respond in character."}, |
|
|
{"role": "user", "content": "What's your biggest fear?"} |
|
|
] |
|
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=150, |
|
|
do_sample=True, |
|
|
top_p=0.9, |
|
|
temperature=0.8, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True).strip()) |
|
|
|
|
|
|