FIRM-SD3.5 / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata for FIRM-SD3.5
19eac61 verified
|
raw
history blame
2.33 kB
metadata
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - lora
  - image-generation
  - reinforcement-learning
  - reward-modeling
  - firm

FIRM-SD3.5

This repository contains the LoRA weights for FIRM-SD3.5, an enhanced text-to-image generation model developed using the FIRM (Faithful Image Reward Modeling) framework.

The model was introduced in the paper Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation.

Model Description

Reinforcement learning (RL) for visual generation relies heavily on the faithfulness of the reward model used as a critic. FIRM addresses common issues like hallucinations and noisy scoring through:

  1. Tailored Data Pipelines: Using specialized curation for editing (execution and consistency) and generation (instruction following).
  2. Robust Reward Models: Training specialized reward models (like FIRM-Gen-8B) on high-quality scoring datasets.
  3. "Base-and-Bonus" Reward Strategy: A novel strategy to balance competing objectives, such as Quality-Modulated Alignment (QMA) for generation.

The resulting FIRM-SD3.5 model demonstrates significant breakthroughs in fidelity and instruction adherence compared to existing general models by mitigating hallucinations.

Citation

@article{zhao2026trust,
  title={Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation},
  author={Zhao, Xiangyu and Zhang, Peiyuan and Lin, Junming and Liang, Tianhao and Duan, Yuchen and Ding, Shengyuan and Tian, Changyao and Zang, Yuhang and Yan, Junchi and Yang, Xue},
  journal={arXiv preprint arXiv:2603.12247},
  year={2026}
}

Acknowledgements

This project was developed by the VisionXLab and builds upon several open-source projects including flow-grpo, DiffusionNFT, and Edit-R1.