FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents
Paper • 2506.01520 • Published
FS-DFM 1.3B trained with Flow-GRPO v13 DCE (Denoising Cross Entropy) on FormFactory. Best RL config for FS-DFM: +4.9pp nonzero rate, +8.9% reward vs SFT on test split. Uses advantage-weighted GKL loss aligned with the diffusion denoising trajectory. Part of the STAD80 project: Generative Action Planning via Discrete Flow Matching.
Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning
If you use this model, please cite:
@article{brillian2026flowgrpo,
title={Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning},
author={Brillian, Muhammad Enrizky},
year={2026}
}
This model was trained and evaluated on the FormFactory benchmark:
@misc{li2025formfactory,
title={FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents},
author={Bobo Li and Yuheng Wang and Hao Fei and Juncheng Li and Wei Ji and Mong-Li Lee and Wynne Hsu},
year={2025},
eprint={2506.01520},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.01520}
}