Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
jaygala24
's Collections
RL post-training
RL post-training
updated
7 days ago
Upvote
-
jaygala24/Qwen3-4B-GRPO-KL-math-reasoning
Text Generation
•
4B
•
Updated
16 days ago
•
1.13k
jaygala24/Qwen3-4B-GRPO-math-reasoning
Text Generation
•
4B
•
Updated
16 days ago
•
944
jaygala24/Qwen3-4B-ReMax-math-reasoning
Text Generation
•
4B
•
Updated
16 days ago
•
886
jaygala24/Qwen3-4B-RLOO-math-reasoning
Text Generation
•
4B
•
Updated
10 days ago
•
371
jaygala24/Qwen3-4B-DAPO-math-reasoning
Text Generation
•
4B
•
Updated
7 days ago
•
615
jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
16 days ago
•
916
jaygala24/Qwen3-1.7B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
16 days ago
•
932
jaygala24/Qwen3-1.7B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
16 days ago
•
978
jaygala24/Qwen3-1.7B-RLOO-math-reasoning
Text Generation
•
2B
•
Updated
10 days ago
•
887
jaygala24/Qwen3-1.7B-DAPO-math-reasoning
Text Generation
•
2B
•
Updated
10 days ago
•
767
jaygala24/Qwen2.5-3B-GRPO-KL-math-reasoning
Text Generation
•
3B
•
Updated
16 days ago
•
852
jaygala24/Qwen2.5-3B-GRPO-math-reasoning
Text Generation
•
3B
•
Updated
16 days ago
•
867
jaygala24/Qwen2.5-3B-ReMax-math-reasoning
Text Generation
•
3B
•
Updated
16 days ago
•
521
jaygala24/Qwen2.5-3B-RLOO-math-reasoning
Text Generation
•
3B
•
Updated
10 days ago
•
815
jaygala24/Qwen2.5-3B-DAPO-math-reasoning
Text Generation
•
3B
•
Updated
10 days ago
•
732
jaygala24/Qwen2.5-1.5B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
16 days ago
•
580
jaygala24/Qwen2.5-1.5B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
16 days ago
•
637
jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
16 days ago
•
502
jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning
Text Generation
•
2B
•
Updated
10 days ago
•
767
jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning
Text Generation
•
2B
•
Updated
10 days ago
•
906
jaygala24/Qwen2.5-0.5B-GRPO-KL-math-reasoning
Text Generation
•
0.5B
•
Updated
16 days ago
•
590
jaygala24/Qwen2.5-0.5B-GRPO-math-reasoning
Text Generation
•
0.5B
•
Updated
16 days ago
•
623
jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning
Text Generation
•
0.5B
•
Updated
16 days ago
•
487
jaygala24/Qwen2.5-0.5B-RLOO-math-reasoning
Text Generation
•
0.5B
•
Updated
10 days ago
•
715
jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning
Text Generation
•
0.5B
•
Updated
10 days ago
•
700
Upvote
-
Share collection
View history
Collection guide
Browse collections