Reinforcement Learning
Transformers
Safetensors
qwen3
text-generation
grpo
instruction-following
ifbench
slime
text-generation-inference
Instructions to use arlarena/if_step200 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use arlarena/if_step200 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("arlarena/if_step200") model = AutoModelForMultimodalLM.from_pretrained("arlarena/if_step200") - Notebooks
- Google Colab
- Kaggle
if_step200
Qwen3-8B-Base fine-tuned with GRPO (via slime) on multi-constraint instruction-following data. Training reward = IFEval-G rule checks + a Skywork-Reward-V2-Llama-3.1-8B judge bonus.
This is the iteration 200 checkpoint — the last clean checkpoint while the reward
model was healthy (the RM judge went offline around step ~287, after which later
checkpoints show reward-hacking on the rule checkers). Converted from Megatron
torch_dist to HF format.
- Downloads last month
- 1
Model tree for arlarena/if_step200
Base model
Qwen/Qwen3-8B-Base