LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse

LLaMA 3.1 8B fine tuned on Light R1 DPO dataset for 100 steps with custom Triton-accelerated BSR-AdamW optimizer

Model Details

Base Model: meta-llama/Llama-3.1-8B-Instruct
Architecture: Llama 3.1 8B Instruct
Training: Direct Preference Optimization (DPO) with Triton based Sparse AdamW Optimizer. For details, see: GitHub
Task: Text generation, instruction following, conversational AI

Requirements

transformers >= 4.43.0 (required for full Llama 3.1 support)
torch (recommended: torch >= 2.0.0)

Usage

Installation

pip install --upgrade transformers torch

Basic Usage with Transformers

Starting with transformers >= 4.43.0, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Using Pipeline

import transformers
import torch

model_id = "ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain what machine learning is."},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Using AutoModelForCausalLM

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain what machine learning is."},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Tool Use

Llama 3.1 supports tool use through chat templates in Transformers. See the official documentation for detailed examples.

Model Information

This model is based on Meta's Llama 3.1 8B Instruct model, fine-tuned using Direct Preference Optimization (DPO). The model maintains compatibility with the original Llama 3.1 architecture and chat template format.

For more information about the base model, see:

Citation

If you use this model, please cite the original Llama 3.1 paper:

@article{{meta2024llama,
  title={{Llama 3.1}},
  author={{Meta AI}},
  year={{2024}}
}}

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2750)

this model

Collection including ScottBiggs2/LLaMA-3.1-8B-Instruct-DPO-Triton-Sparse

RL Casino Models

Collection

Model checkpoints generated during an ongoing research effort into the acceleration potential and tuning quality of LLMs with RL fine tuning. • 2 items • Updated Jan 16