Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
HuangXinBa
/
GRPO
like
1
Text Generation
Safetensors
gsm8k
English
llama
causal-lm
reinforcement-learning
GRPO
instruction-tuning
chain-of-thought
conversational
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
GRPO
Commit History
Add full model card (README.md)
3bbfa9a
verified
HuangXinBa
commited on
May 28, 2025
Upload LlamaForCausalLM
bc15fbb
verified
HuangXinBa
commited on
May 28, 2025
Add full model card (README.md)
8cb0da9
verified
HuangXinBa
commited on
May 28, 2025
Upload LlamaForCausalLM
f8e9135
verified
HuangXinBa
commited on
May 28, 2025
Add full model card (README.md)
5d4e08f
verified
HuangXinBa
commited on
May 28, 2025
Upload LlamaForCausalLM
5d93fe5
verified
HuangXinBa
commited on
May 28, 2025
Add full model card (README.md)
77c452c
verified
HuangXinBa
commited on
May 28, 2025
Upload LlamaForCausalLM
6bd03f6
verified
HuangXinBa
commited on
May 27, 2025
Upload tokenizer
bf15d76
verified
HuangXinBa
commited on
May 27, 2025
Upload LlamaForCausalLM
33bf48c
verified
HuangXinBa
commited on
May 27, 2025
initial commit
4d8631f
verified
HuangXinBa
commited on
May 27, 2025