Openenv / cloud_arena

Commit History

Fix torch bfloat16 errors on T4 GPUs by enabling Unsloth dtype auto-detection and explicitly wrapping forward passes in autocast
4b22b06

saravanatanjiro commited on

Add mandatory Unsloth inference state toggles around generation for RL pipeline
81ed883

saravanatanjiro commited on

Capture and display exact Unsloth import exception
fbf8187

saravanatanjiro commited on

Update with existing environment
d81b76a

saravanatanjiro commited on

Fix GRPO group-loss training and align UI defaults.
5a0c6af

saravanatanjiro commited on

Migrate LLM pipeline to custom GRPO with robust rewards
dfc5996

saravanatanjiro commited on

Multi-model benchmark pipeline: VRAM cleanup + EMA graph + detailed output
af6bbef

kavin57447 commited on

Fix truncation: 80 tokens, regex safety net, strict prompt
deef82c

kavin57447 commited on

Hackathon speedrun: max_new_tokens=32, seq_len=512 for 4-8x faster iterations
ee5ddee

kavin57447 commited on

Replace flash-attn with PyTorch built-in SDPA (no CUDA compile needed)
e9dea07

kavin57447 commited on

Max GPU utilization: flash-attn2 + grad accumulation + 15 steps/ep + 1024 seq len
93d0171

kavin57447 commited on

Switch to Llama 3.1 8B + fix low-timestep crash (min 5000)
8d95050

kavin57447 commited on

Add LLM RL training with Gemma 7B + LoRA
ee3dfa7

kavin57447 commited on

Add Cloud Arena Mathematical Model RL environment
12263fa

kavin57447 commited on