Fix torch bfloat16 errors on T4 GPUs by enabling Unsloth dtype auto-detection and explicitly wrapping forward passes in autocast 4b22b06 saravanatanjiro commited on Apr 26
Set TRITON_CACHE_DIR to /tmp/triton_cache to avoid root permission denied error 5f168d6 saravanatanjiro commited on Apr 26
Add mandatory Unsloth inference state toggles around generation for RL pipeline 81ed883 saravanatanjiro commited on Apr 26
Pin pydantic, fastapi, and starlette to fix Gradio 4.x JSON schema and TemplateResponse bugs 56934e2 saravanatanjiro commited on Apr 26
Pin Gradio to 4.36.1 to fix TypeError during json schema parsing on startup 477c526 saravanatanjiro commited on Apr 26
Pin huggingface-hub to 0.24.7 to fix Unsloth _token import error 4dfbc48 saravanatanjiro commited on Apr 26
Switch SDK to docker to use custom Dockerfile and fix pip build b4a2158 saravanatanjiro commited on Apr 26
Fix Gradio sdk_version to a valid fully-specified version (4.44.0) 07dcf6a saravanatanjiro commited on Apr 26
Multi-model benchmark pipeline: VRAM cleanup + EMA graph + detailed output af6bbef kavin57447 commited on Apr 25
Hackathon speedrun: max_new_tokens=32, seq_len=512 for 4-8x faster iterations ee5ddee kavin57447 commited on Apr 25
Replace flash-attn with PyTorch built-in SDPA (no CUDA compile needed) e9dea07 kavin57447 commited on Apr 25
Fix: install torch before flash-attn (needs torch at build time) 332efeb kavin57447 commited on Apr 25
Max GPU utilization: flash-attn2 + grad accumulation + 15 steps/ep + 1024 seq len 93d0171 kavin57447 commited on Apr 25