Fix TRL 0.18 compatibility: remove unsupported generation_kwargs; set safety flags on model.generation_config. 6083a40 md896 commited on 13 days ago
Harden GRPO generation stability on CUDA: bf16 + eager attention + invalid-logit guards. 948530a md896 commited on 13 days ago
Fix GRPO batch/generation mismatch: auto-adjust num_generations; set launcher default to 2. af54ccd md896 commited on 13 days ago
Simplify HF training stack: remove unsloth/vllm path, use plain transformers AutoModel + single OpenEnv reward. e5262a1 md896 commited on 13 days ago
Fix Unsloth startup: avoid pre-importing trl/transformers; mock vllm as real package modules. d21de11 md896 commited on 13 days ago
Fix HF job startup: import unsloth first and shim vllm package metadata check. 1fdba13 md896 commited on 13 days ago
Fix HF Job bootstrap: transformers>=4.51 for trl 0.18, datasets<4; simplify to colab-style OpenEnv SQL reward. ee30276 md896 commited on 13 days ago
Fix HF Jobs bootstrap (pin transformers/trl, drop torchao stack); add reward and trainer JSONL logging; stabilize launch_job. ceee0e3 md896 commited on 13 days ago
Fix: Mock vllm and llm_blender to stabilize GRPOTrainer in HF Jobs environment bc20ef9 md896 commited on 13 days ago
Downgrade TRL to 0.22.2 to natively bypass experimental vllm dependencies 2eb9add md896 commited on 13 days ago
Fix vllm error cleanly by creating fake python module structure b2ce6c6 md896 commited on 13 days ago