Fix GRPO batch/generation mismatch: auto-adjust num_generations; set launcher default to 2. af54ccd md896 commited on 12 days ago
Fix HF Jobs bootstrap (pin transformers/trl, drop torchao stack); add reward and trainer JSONL logging; stabilize launch_job. ceee0e3 md896 commited on 12 days ago
Fix: Mock vllm and llm_blender to stabilize GRPOTrainer in HF Jobs environment bc20ef9 md896 commited on 12 days ago