Fix TRL 0.18 compatibility: remove unsupported generation_kwargs; set safety flags on model.generation_config. 6083a40 md896 commited on Apr 25
Harden GRPO generation stability on CUDA: bf16 + eager attention + invalid-logit guards. 948530a md896 commited on Apr 25
Fix GRPO batch/generation mismatch: auto-adjust num_generations; set launcher default to 2. af54ccd md896 commited on Apr 25
Simplify HF training stack: remove unsloth/vllm path, use plain transformers AutoModel + single OpenEnv reward. e5262a1 md896 commited on Apr 25
Fix Unsloth startup: avoid pre-importing trl/transformers; mock vllm as real package modules. d21de11 md896 commited on Apr 25
Fix HF job startup: import unsloth first and shim vllm package metadata check. 1fdba13 md896 commited on Apr 25
Fix HF Job bootstrap: transformers>=4.51 for trl 0.18, datasets<4; simplify to colab-style OpenEnv SQL reward. ee30276 md896 commited on Apr 25
Fix HF Jobs bootstrap (pin transformers/trl, drop torchao stack); add reward and trainer JSONL logging; stabilize launch_job. ceee0e3 md896 commited on Apr 25
Fix: Mock vllm and llm_blender to stabilize GRPOTrainer in HF Jobs environment bc20ef9 md896 commited on Apr 25
Downgrade TRL to 0.22.2 to natively bypass experimental vllm dependencies 2eb9add md896 commited on Apr 25