MaxRL - a ftajwar Collection

ftajwar 's Collections

Self-Rewarding-LLM-Training

MaxRL

updated Feb 26

Qwen3-Base post-trained checkpoints for our paper, Maximum Likelihood Reinforcement Learning [https://zanette-labs.github.io/MaxRL/]

ftajwar/qwen3_4B_Base_MaxRL_Polaris_1000_steps

Text Generation • 4B • Updated Feb 26 • 4
ftajwar/qwen3_4B_Base_GRPO_Polaris_1000_steps

Text Generation • 4B • Updated Feb 26 • 4
ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps

Text Generation • 2B • Updated Feb 26 • 4
ftajwar/qwen3_1.7B_Base_GRPO_Polaris_1000_steps

Text Generation • 2B • Updated Feb 26 • 42