AI & ML interests
None yet
Organizations
None yet
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_300_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_200_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_200_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1700_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1700_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1600_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1600_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1500_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1500_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1400_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1400_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1300_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1300_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1200_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1200_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1100_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1100_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1000_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_1000_ACTOR
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_100_CRITIC
2B
•
Updated
Prathyusha101/Qwen2.5-1.5B-PPO_global_step_100_ACTOR
2B
•
Updated
Prathyusha101/sept_5_ppo_kl_on_base_model_qwen_0.5b_435_steps_CRITIC
0.5B
•
Updated
Prathyusha101/sept_5_ppo_kl_on_base_model_qwen_0.5b_435_steps_ACTOR
0.6B
•
Updated
Prathyusha101/sept_5_ppo_no_kl_qwen_0.5b_400_steps_CRITIC
0.5B
•
Updated
Prathyusha101/sept_5_ppo_no_kl_qwen_0.5b_400_steps_ACTOR
0.6B
•
Updated
Prathyusha101/sept_5_ppo_kl_qwen_0.5b_400_steps_ACTOR
0.6B
•
Updated
Prathyusha101/sept_5_ppo_kl_qwen_0.5b_400_steps_CRITIC
0.5B
•
Updated
Prathyusha101/qwen-math-ppo-1.0.0.50-critic
0.6B
•
Updated
Prathyusha101/sept_3_qwen2-0.5b-RLOO-with-kl
Updated
Prathyusha101/qwen2-0.5b-REINFORCE-no-baseline-kl-disabled
Text Generation
•
0.5B
•
Updated
•
2