Model Card for Model ID

Model Card for Llama_DPO_lora

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct. It has been trained using DPO. The dataset it was build upon is a combination on MathDial dataset and generated model responses using MathDial as an input.

The model is optimized for:

  • Conversational math problem solving
  • Step-by-step reasoning in dialogue form
  • Scaffolding

Repository: Github code DPO Training and Datasets


Intended Use

This model is intended for use in:

  • Interactive math tutoring
  • Research in dialogue-based problem solving
  • Educational tools

Example Usage


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Base model
basemodelname = "meta-llama/Llama-3.1-8B-Instruct"
base_model = AutoModelForCausalLM.from_pretrained(
    basemodelname,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load adapter on top
peft_model_path = "abbatea/Tutorbot-variation-DPO-Llama"
model = PeftModel.from_pretrained(base_model, peft_model_path)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(basemodelname)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id



messages = [
    {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Citations

DPO:

@inproceedings{rafailov2023direct,
    title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
    editor       = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for abbatea/Tutorbot-variation-DPO-Llama

Adapter
(1344)
this model