GRPO - a Moenupa Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Moenupa 's Collections

GRPO

updated about 8 hours ago

GRPO checkpoints (every 10th step) and Dataset

Denser neq Better: Limits of On-Policy Self-Distillation for Continual Post-Training

Paper • 2607.01763 • Published 1 day ago • 3
Moenupa/Dolci-Think-RL-7B

Viewer • Updated Jun 2 • 72.2k • 32

Note Dataset
Moenupa/verl

Viewer • Updated May 29 • 216k • 29

Note Dataset
Moenupa/Qwen3-4B-Thinking-2507-GRPO-Tool

Updated 4 days ago
Moenupa/Qwen3-4B-Thinking-2507-GRPO-Math

Updated 4 days ago
Moenupa/Qwen3-4B-Thinking-2507-GRPO-Chem

Updated 4 days ago
Moenupa/Qwen3-4B-Thinking-2507-GRPO-MathChem

Updated about 8 hours ago
Moenupa/Qwen3-4B-Thinking-2507-GRPO-MathChemTool

Updated about 8 hours ago
Moenupa/Qwen3-4B-Thinking-2507-GRPO-MathChemToolCode

Updated about 8 hours ago

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs