vivekvar
/

GSPO-DeepSeek-R1-Distill-Qwen-1.5B

Text Generation

reinforcement-learning

mathematical-reasoning

policy-optimization

sequence-level-training

Model card Files Files and versions

Resources

View closed (0)

Welcome to the community

The community tab is the place to discuss and collaborate with the HF community!