Uploaded model
- Developed by: Sayantan ghosh
- License: apache-2.0
- Finetuned from model: unsloth/Llama-3.2-1B-Instruct
This LLaMA model was trained 2× faster with Unsloth and Huggingface's TRL library.
🚀 Inference Code
You can use the following code to run inference with this model using Unsloth:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "SGHOSH1999/FineLlama3.1-1B-Instruct", # Your model repo
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# Create a conversation prompt
messages = [
{"role": "user", "content": "Describe a tall tower in the capital of France."},
]
# Tokenize input
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must be set for generation
return_tensors = "pt",
).to("cuda")
# Generate text
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(
input_ids = inputs,
streamer = text_streamer,
max_new_tokens = 128,
use_cache = True,
temperature = 1.5,
min_p = 0.1,
)
- Downloads last month
- 1
Model tree for SGHOSH1999/FineLlama3.1-1B-Instruct
Base model
meta-llama/Llama-3.1-8B Quantized
unsloth/Meta-Llama-3.1-8B-bnb-4bit