Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
vivekvar
/
GSPO-DeepSeek-R1-Distill-Qwen-1.5B
like
2
Text Generation
Safetensors
custom
English
qwen2
reinforcement-learning
reasoning
mathematical-reasoning
gspo
policy-optimization
sequence-level-training
conversational
arxiv:
2507.18071
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
GSPO-DeepSeek-R1-Distill-Qwen-1.5B
/
tokenizer.json
Commit History
Upload folder using huggingface_hub
201b329
verified
vivekvar
commited on
Jul 31, 2025