Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
sohyunan
/
gemma-2-2b-it_controller_sft_random_grpo
like
0
Text Generation
Transformers
Safetensors
maze_5x5
gemma2
Generated from Trainer
controller-grpo
trl
grpo
conversational
text-generation-inference
arxiv:
2402.03300
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
gemma-2-2b-it_controller_sft_random_grpo
Commit History
Training in progress, step 16
f409bf8
verified
sohyunan
commited on
Feb 7, 2025
Training in progress, step 12
52c2a38
verified
sohyunan
commited on
Feb 7, 2025
Training in progress, step 8
cbb4196
verified
sohyunan
commited on
Feb 7, 2025
Training in progress, step 4
1cda6ee
verified
sohyunan
commited on
Feb 7, 2025
Training in progress, step 8
83dd077
verified
sohyunan
commited on
Feb 6, 2025
Training in progress, step 4
91f3028
verified
sohyunan
commited on
Feb 6, 2025
Training in progress, step 8
87f8c6d
verified
sohyunan
commited on
Feb 6, 2025
Training in progress, step 4
77afd08
verified
sohyunan
commited on
Feb 6, 2025
End of training
6d5e9b1
verified
sohyunan
commited on
Feb 6, 2025
Model save
a1ba8fd
verified
sohyunan
commited on
Feb 6, 2025
initial commit
c68906f
verified
sohyunan
commited on
Feb 6, 2025