How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RLHFlow/LLaMA3-SFT-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RLHFlow/LLaMA3-SFT-v2")
model = AutoModelForCausalLM.from_pretrained("RLHFlow/LLaMA3-SFT-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

This is the SFT checkpoint used for the project RLHFlow/Online-RLHF

The model is trained from meta-llama/Meta-Llama-3-8B on RLHFlow/RLHFlow-SFT-Dataset-ver2 for 2 epochs. We use a global batch size of 128 and a learning rate of 2e-5, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/sft/llama3-8b-it.yaml .

Academic Benchmarks

We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting.

Model Size Method LC AlpacaEval MT-Bench GSM-8K MATH MMLU HumanEval TruthfulQA ARC
LLaMA-3-8B-it 8B RS+DPO+PPO 22.9 8.16 79.6 26.3 66.0 61.6 43.9 59.5
RLHFlow/LLaMA3-SFT 8B SFT 10.2 7.69 74.2 30.0 64.6 63.4 53.5 58.6
RLHFlow/LLaMA3-SFT-v2 8B SFT 12.66 - 83.4 41.1 64.8 66.5 53.9 60.0

Citation

Please cite our techical report if you find our model is useful for your research or product.

@misc{dong2024rlhf,
      title={RLHF Workflow: From Reward Modeling to Online RLHF}, 
      author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
      year={2024},
      eprint={2405.07863},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Downloads last month
513
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with RLHFlow/LLaMA3-SFT-v2.

Model tree for RLHFlow/LLaMA3-SFT-v2

Adapters
3 models
Finetunes
6 models
Quantizations
4 models

Collection including RLHFlow/LLaMA3-SFT-v2

Paper for RLHFlow/LLaMA3-SFT-v2