Fix HF Job bootstrap: transformers>=4.51 for trl 0.18, datasets<4; simplify to colab-style OpenEnv SQL reward. ee30276 md896 commited on Apr 25
Fix HF Jobs bootstrap (pin transformers/trl, drop torchao stack); add reward and trainer JSONL logging; stabilize launch_job. ceee0e3 md896 commited on Apr 25
Fix: Mock vllm and llm_blender to stabilize GRPOTrainer in HF Jobs environment bc20ef9 md896 commited on Apr 25
Downgrade TRL to 0.22.2 to natively bypass experimental vllm dependencies 2eb9add md896 commited on Apr 25