fix: lora_dropout=0.05 + HF gradient checkpoint to bypass unsloth GRPO dtype bug; chdir to /tmp before merged push 4e5191a verified Laksh718 commited on Apr 26
resilient training: SFT push insurance, dtype workaround, try/except GRPO, SKIP_GRPO env 07c17a2 verified Laksh718 commited on Apr 26
fix: writable OUT_DIR (/tmp), chat-template text column, push_history_to_hub 31f6ba9 verified Laksh718 commited on Apr 26
fix: render chat template into text column for new unsloth/TRL SFTTrainer 8e3d067 verified Laksh718 commited on Apr 26