FS-DFM-1.3B-FlowGRPO

FS-DFM 1.3B trained with Flow-GRPO v13 DCE (Denoising Cross Entropy) on FormFactory. Best token-level RL config for FS-DFM: 73.4% nonzero rate / 0.159 average reward on 124 test tasks (+4.9pp / +8.9% vs SFT). Uses advantage-weighted GKL loss aligned with the diffusion denoising trajectory. Part of the STAD80 project: Generative Action Planning via Discrete Flow Matching.

Paper

Concentrate or Collapse: When Reinforcement Learning Meets Diffusion Language Models for Web Planning

Author: Muhammad Enrizky Brillian
Institution: University of Toronto Scarborough

Training Details

Dataset: FormFactory (992 train / 124 val / 124 test tasks, 25 form types, 8 domains)
Infrastructure: Single NVIDIA A10G GPU (24GB VRAM) on Anyscale
Framework: PyTorch + PEFT (LoRA/QLoRA)

Citation

If you use this model, please cite:

@article{brillian2026flowgrpo,
  title={Concentrate or Collapse: When Reinforcement Learning Meets Diffusion Language Models for Web Planning},
  author={Brillian, Muhammad Enrizky},
  year={2026}
}

This model was trained and evaluated on the FormFactory benchmark:

@misc{li2025formfactory,
  title={FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents},
  author={Bobo Li and Yuheng Wang and Hao Fei and Juncheng Li and Wei Ji and Mong-Li Lee and Wynne Hsu},
  year={2025},
  eprint={2506.01520},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2506.01520}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Paper for billyenrizky/FS-DFM-1.3B-FlowGRPO

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

Paper • 2506.01520 • Published Jun 2, 2025