SFT Models
Collection
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose. • 6 items • Updated • 2
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("RLHFlow/LLaMA3-SFT-v2")
model = AutoModelForCausalLM.from_pretrained("RLHFlow/LLaMA3-SFT-v2")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))This is the SFT checkpoint used for the project RLHFlow/Online-RLHF
The model is trained from meta-llama/Meta-Llama-3-8B on RLHFlow/RLHFlow-SFT-Dataset-ver2 for 2 epochs. We use a global batch size of 128 and a learning rate of 2e-5, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/sft/llama3-8b-it.yaml .
We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting.
| Model | Size | Method | LC AlpacaEval | MT-Bench | GSM-8K | MATH | MMLU | HumanEval | TruthfulQA | ARC |
|---|---|---|---|---|---|---|---|---|---|---|
| LLaMA-3-8B-it | 8B | RS+DPO+PPO | 22.9 | 8.16 | 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |
| RLHFlow/LLaMA3-SFT | 8B | SFT | 10.2 | 7.69 | 74.2 | 30.0 | 64.6 | 63.4 | 53.5 | 58.6 |
| RLHFlow/LLaMA3-SFT-v2 | 8B | SFT | 12.66 | - | 83.4 | 41.1 | 64.8 | 66.5 | 53.9 | 60.0 |
Please cite our techical report if you find our model is useful for your research or product.
@misc{dong2024rlhf,
title={RLHF Workflow: From Reward Modeling to Online RLHF},
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
year={2024},
eprint={2405.07863},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RLHFlow/LLaMA3-SFT-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)