--- base_model: meta-llama/Llama-3.1-8B-Instruct library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:meta-llama/Llama-3.1-8B-Instruct - dpo - lora - transformers - trl --- # Model Card for Model ID # Model Card for Llama_DPO_lora This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). It has been trained using [DPO](https://huggingface.co/docs/trl/en/dpo_trainer). The dataset it was build upon is a combination on **[MathDial dataset](https://huggingface.co/datasets/eth-nlped/mathdial-chat/viewer/default/train?views%5B%5D=train&row=0)** and generated model responses using MathDial as an input. The model is optimized for: - Conversational math problem solving - Step-by-step reasoning in dialogue form - Scaffolding Repository: **[Github code DPO Training and Datasets](https://github.com/abbatea/MathDial-SFT-and-DPO/tree/main/DPO_Finetuning)** --- ## Intended Use This model is intended for use in: - Interactive math tutoring - Research in dialogue-based problem solving - Educational tools --- ## Example Usage ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel # Base model basemodelname = "meta-llama/Llama-3.1-8B-Instruct" base_model = AutoModelForCausalLM.from_pretrained( basemodelname, torch_dtype=torch.bfloat16, device_map="auto" ) # Load adapter on top peft_model_path = "abbatea/Tutorbot-variation-DPO-Llama" model = PeftModel.from_pretrained(base_model, peft_model_path) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(basemodelname) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokenizer.pad_token_id = tokenizer.eos_token_id messages = [ {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"} ] prompt = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Citations DPO: ```bibtex @inproceedings{rafailov2023direct, title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}}, editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine}, } ```