Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
khazarai
's Collections
Distilled Models
Benchmarks & Datasets
CoT
GRPO
Text-to-Speech Models
RLHF
SFT
GRPO
updated
4 days ago
Group Relative Policy Optimization
Upvote
1
khazarai/Math-RL
Text Generation
•
0.5B
•
Updated
1 day ago
•
181
•
1
khazarai/HeisenbergQ-0.5B-RL
Text Generation
•
0.5B
•
Updated
1 day ago
•
422
•
1
Upvote
1
Share collection
View history
Collection guide
Browse collections