Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
Moenupa 's Collections
SDPO@CL
SDPO@CoT
SDPO@EMA
GRPO

GRPO

updated about 8 hours ago

GRPO checkpoints (every 10th step) and Dataset

Upvote
-

  • Denser neq Better: Limits of On-Policy Self-Distillation for Continual Post-Training

    Paper • 2607.01763 • Published 1 day ago • 3

  • Moenupa/Dolci-Think-RL-7B

    Viewer • Updated Jun 2 • 72.2k • 32

    Note Dataset


  • Moenupa/verl

    Viewer • Updated May 29 • 216k • 29

    Note Dataset


  • Moenupa/Qwen3-4B-Thinking-2507-GRPO-Tool

    Updated 4 days ago

  • Moenupa/Qwen3-4B-Thinking-2507-GRPO-Math

    Updated 4 days ago

  • Moenupa/Qwen3-4B-Thinking-2507-GRPO-Chem

    Updated 4 days ago

  • Moenupa/Qwen3-4B-Thinking-2507-GRPO-MathChem

    Updated about 8 hours ago

  • Moenupa/Qwen3-4B-Thinking-2507-GRPO-MathChemTool

    Updated about 8 hours ago

  • Moenupa/Qwen3-4B-Thinking-2507-GRPO-MathChemToolCode

    Updated about 8 hours ago
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs