md896's picture
Simplify HF training stack: remove unsloth/vllm path, use plain transformers AutoModel + single OpenEnv reward.
e5262a1