HackWeasel's picture
Update README.md
e031ec9 verified
metadata
base_model:
  - unsloth/Llama-3.2-1B-Instruct
library_name: peft
license: apache-2.0
language:
  - en
pipeline_tag: text-generation

Model Details

  • Developed by: HackWeasel
  • Funded by: GT Edge AI
  • Model type: LLM
  • Language(s) (NLP): English
  • License: Apache license 2.0
  • Finetuned from model: unsloth/llama3.2-1b-instruct

Uses

Ask questions about movies which have been rated on IMDB

How to Get Started with the Model

Use the code below to get started with the model.

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"

def load_model(base_model_id, adapter_model_id):
    print("Loading models...")
    
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_id)
    
    # Load base model (using model's built-in quantization)
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_id,
        device_map="auto",
        low_cpu_mem_usage=True
    )
    
    # Load the PEFT model
    model = PeftModel.from_pretrained(
        base_model, 
        adapter_model_id,
        device_map="auto"
    )
    
    model.eval()
    print("Models loaded!")
    return model, tokenizer

def generate_response(model, tokenizer, prompt, max_length=4096, temperature=0.7):
    with torch.no_grad():
        inputs = tokenizer(prompt, return_tensors="pt").to(device)
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=temperature,
            do_sample=True,
            top_p=0.95,
            top_k=40,
            num_return_sequences=1,
            pad_token_id=tokenizer.eos_token_id
        )
        return tokenizer.decode(outputs[0], skip_special_tokens=True)

def main():
    model, tokenizer = load_model(
        "unsloth/llama-3.2-1b-instruct-bnb-4bit",
        "HackWeasel/llama-3.2-1b-QLORA-IMDB"
    )
    
    conversation_history = ""
    print("\nWelcome! Start chatting with the model (type 'quit' to exit)")
    print("Note: This model is fine-tuned on IMDB reviews data")
    
    while True:
        try:
            user_input = input("\nYou: ").strip()
            if user_input.lower() == 'quit':
                print("Goodbye!")
                break

            if conversation_history:
                full_prompt = f"{conversation_history}\nHuman: {user_input}\nAssistant:"
            else:
                full_prompt = f"Human: {user_input}\nAssistant:"

            response = generate_response(model, tokenizer, full_prompt)
            new_response = response.split("Assistant:")[-1].strip()
            conversation_history = f"{conversation_history}\nHuman: {user_input}\nAssistant: {new_response}"
            print("\nAssistant:", new_response)

        except Exception as e:
            print(f"An error occurred: {e}")
            print("Continuing conversation...")

if __name__ == "__main__":
    main()

Training Data

datasets/mteb/imdb/tree/main/test.jsonl

Training Procedure

QLoRA via unsloth

  • PEFT 0.14.0