Polish for hackathon submission: training evidence, two pipelines, UI, docs e81353d K446 commited on Apr 26
Fix health check timeout: start UI server in background before training 89992e4 K446 commited on Apr 26
Fix GRPO training: reward variance, batch/gen alignment, generation config e1ab78c K446 commited on Apr 25
Add pre-train gen sanity check, explicit GenerationConfig, dynamic GRPOConfig params, torch_compile/vllm off a6ecb81 K446 commited on Apr 25
QLoRA best practices: prepare_model_for_kbit_training, paged_adamw_8bit, cosine LR, faster iteration 8dab919 K446 commited on Apr 25
Drop unsloth: use standard bitsandbytes 4-bit + peft LoRA + TRL GRPOTrainer 6072ace K446 commited on Apr 25