Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
KRadim
/
custom-SmolGRPO-135M
like
0
Safetensors
llama
trl
grpo
GRPO
Reasoning-Course
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
custom-SmolGRPO-135M
269 MB
1 contributor
History:
3 commits
KRadim
Upload LlamaForCausalLM
c120293
verified
3 months ago
.gitattributes
1.52 kB
initial commit
3 months ago
README.md
81 Bytes
Upload LlamaForCausalLM
3 months ago
config.json
687 Bytes
Upload LlamaForCausalLM
3 months ago
generation_config.json
166 Bytes
Upload LlamaForCausalLM
3 months ago
model.safetensors
269 MB
xet
Upload LlamaForCausalLM
3 months ago