ESPO-GSM8k / README.md
JingyangOu's picture
Update README.md
0402904 verified
metadata
license: mit
datasets:
  - openai/gsm8k
base_model:
  - GSAI-ML/LLaDA-8B-Instruct

Post-Training Lora models on gsm8k task based on LLaDA-8B-Instruct for the paper Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective