Fix torch bfloat16 errors on T4 GPUs by enabling Unsloth dtype auto-detection and explicitly wrapping forward passes in autocast 4b22b06 saravanatanjiro commited on 22 days ago
Add mandatory Unsloth inference state toggles around generation for RL pipeline 81ed883 saravanatanjiro commited on 22 days ago
Migrate LLM pipeline to custom GRPO with robust rewards dfc5996 saravanatanjiro commited on 23 days ago
Multi-model benchmark pipeline: VRAM cleanup + EMA graph + detailed output af6bbef kavin57447 commited on 23 days ago
Fix truncation: 80 tokens, regex safety net, strict prompt deef82c kavin57447 commited on 23 days ago
Hackathon speedrun: max_new_tokens=32, seq_len=512 for 4-8x faster iterations ee5ddee kavin57447 commited on 23 days ago
Replace flash-attn with PyTorch built-in SDPA (no CUDA compile needed) e9dea07 kavin57447 commited on 23 days ago
Max GPU utilization: flash-attn2 + grad accumulation + 15 steps/ep + 1024 seq len 93d0171 kavin57447 commited on 23 days ago
Switch to Llama 3.1 8B + fix low-timestep crash (min 5000) 8d95050 kavin57447 commited on 23 days ago