HackWeasel
/

llama-3.2-1b-QLORA-IMDB

Text Generation

Model card Files Files and versions

llama-3.2-1b-QLORA-IMDB / README.md

HackWeasel's picture

Update README.md

e031ec9 verified about 1 year ago

|

history blame contribute delete

3.16 kB

	---
	base_model:
	- unsloth/Llama-3.2-1B-Instruct
	library_name: peft
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	---

	## Model Details

	- Developed by: HackWeasel
	- Funded by: GT Edge AI
	- Model type: LLM
	- Language(s) (NLP): English
	- License: Apache license 2.0
	- Finetuned from model: unsloth/llama3.2-1b-instruct

	## Uses

	Ask questions about movies which have been rated on IMDB

	## How to Get Started with the Model

	Use the code below to get started with the model.

	``` Python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Set device
	device = "cuda" if torch.cuda.is_available() else "cpu"

	def load_model(base_model_id, adapter_model_id):
	print("Loading models...")

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	# Load base model (using model's built-in quantization)
	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	device_map="auto",
	low_cpu_mem_usage=True
	)

	# Load the PEFT model
	model = PeftModel.from_pretrained(
	base_model,
	adapter_model_id,
	device_map="auto"
	)

	model.eval()
	print("Models loaded!")
	return model, tokenizer

	def generate_response(model, tokenizer, prompt, max_length=4096, temperature=0.7):
	with torch.no_grad():
	inputs = tokenizer(prompt, return_tensors="pt").to(device)
	outputs = model.generate(
	**inputs,
	max_length=max_length,
	temperature=temperature,
	do_sample=True,
	top_p=0.95,
	top_k=40,
	num_return_sequences=1,
	pad_token_id=tokenizer.eos_token_id
	)
	return tokenizer.decode(outputs[0], skip_special_tokens=True)

	def main():
	model, tokenizer = load_model(
	"unsloth/llama-3.2-1b-instruct-bnb-4bit",
	"HackWeasel/llama-3.2-1b-QLORA-IMDB"
	)

	conversation_history = ""
	print("\nWelcome! Start chatting with the model (type 'quit' to exit)")
	print("Note: This model is fine-tuned on IMDB reviews data")

	while True:
	try:
	user_input = input("\nYou: ").strip()
	if user_input.lower() == 'quit':
	print("Goodbye!")
	break

	if conversation_history:
	full_prompt = f"{conversation_history}\nHuman: {user_input}\nAssistant:"
	else:
	full_prompt = f"Human: {user_input}\nAssistant:"

	response = generate_response(model, tokenizer, full_prompt)
	new_response = response.split("Assistant:")[-1].strip()
	conversation_history = f"{conversation_history}\nHuman: {user_input}\nAssistant: {new_response}"
	print("\nAssistant:", new_response)

	except Exception as e:
	print(f"An error occurred: {e}")
	print("Continuing conversation...")

	if __name__ == "__main__":
	main()
	```

	### Training Data

	datasets/mteb/imdb/tree/main/test.jsonl

	### Training Procedure

	QLoRA via unsloth



	- PEFT 0.14.0