Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Spaces:
nks321
/
moa-rl-env
like
0
Sleeping
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
moa-rl-env
/
training
22.9 kB
Ctrl+K
Ctrl+K
2 contributors
History:
10 commits
natnael kahssay
fix: correct model name to unsloth/gpt-oss-20b (no -instruct suffix)
7ba71bb
3 months ago
.gitignore
28 Bytes
add training/ as real directory (Dockerfile + train.py)
3 months ago
Dockerfile
672 Bytes
fix: install trl>=0.16 last with --upgrade to beat unsloth dep pins
3 months ago
rollout_wrapper.py
6.82 kB
feat: RFC 005 interactive rollout wrapper + multi-turn GRPO training
3 months ago
train.py
8.48 kB
fix: correct model name to unsloth/gpt-oss-20b (no -instruct suffix)
3 months ago
train_rfc005.py
6.87 kB
feat: add W&B reward logging to both training scripts
3 months ago