ALaRM: Align Language Models via Hierarchical Rewards Modeling
Paper • 2403.06754 • Published
Trained SFT policy for MT task in the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling".
Check out our project page for more information.
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("halfrot/sft-mt5-base") model = AutoModelForSeq2SeqLM.from_pretrained("halfrot/sft-mt5-base")