LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse
LLaMA 3.1 8B fine tuned on Light R1 DPO dataset for 100 steps with custom Triton-accelerated BSR-AdamW optimizer
Model Details
- Base Model: meta-llama/Llama-3.1-8B-Instruct
- Architecture: Llama 3.1 8B Instruct
- Training: Direct Preference Optimization (DPO) with Triton based Sparse AdamW Optimizer. For details, see: GitHub
- Task: Text generation, instruction following, conversational AI
Requirements
transformers >= 4.43.0(required for full Llama 3.1 support)torch(recommended:torch >= 2.0.0)
Usage
Installation
pip install --upgrade transformers torch
Basic Usage with Transformers
Starting with transformers >= 4.43.0, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
Using Pipeline
import transformers
import torch
model_id = "ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what machine learning is."},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Using AutoModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what machine learning is."},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
Tool Use
Llama 3.1 supports tool use through chat templates in Transformers. See the official documentation for detailed examples.
Model Information
This model is based on Meta's Llama 3.1 8B Instruct model, fine-tuned using Direct Preference Optimization (DPO). The model maintains compatibility with the original Llama 3.1 architecture and chat template format.
For more information about the base model, see:
Citation
If you use this model, please cite the original Llama 3.1 paper:
@article{{meta2024llama,
title={{Llama 3.1}},
author={{Meta AI}},
year={{2024}}
}}
- Downloads last month
- 26
Model tree for ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse
Base model
meta-llama/Llama-3.1-8B