Add model card and metadata for FIRM-SD3.5

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +45 -3
README.md CHANGED
@@ -1,3 +1,45 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ pipeline_tag: text-to-image
5
+ tags:
6
+ - lora
7
+ - image-generation
8
+ - reinforcement-learning
9
+ - reward-modeling
10
+ - firm
11
+ ---
12
+
13
+ # FIRM-SD3.5
14
+
15
+ This repository contains the LoRA weights for **FIRM-SD3.5**, an enhanced text-to-image generation model developed using the FIRM (Faithful Image Reward Modeling) framework.
16
+
17
+ The model was introduced in the paper [Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation](https://huggingface.co/papers/2603.12247).
18
+
19
+ - **Project Page:** [https://firm-reward.github.io/](https://firm-reward.github.io/)
20
+ - **GitHub Repository:** [https://github.com/VisionXLab/FIRM-Reward](https://github.com/VisionXLab/FIRM-Reward)
21
+ - **Paper:** [arXiv:2603.12247](https://huggingface.co/papers/2603.12247)
22
+
23
+ ## Model Description
24
+
25
+ Reinforcement learning (RL) for visual generation relies heavily on the faithfulness of the reward model used as a critic. FIRM addresses common issues like hallucinations and noisy scoring through:
26
+ 1. **Tailored Data Pipelines:** Using specialized curation for editing (execution and consistency) and generation (instruction following).
27
+ 2. **Robust Reward Models:** Training specialized reward models (like FIRM-Gen-8B) on high-quality scoring datasets.
28
+ 3. **"Base-and-Bonus" Reward Strategy:** A novel strategy to balance competing objectives, such as Quality-Modulated Alignment (QMA) for generation.
29
+
30
+ The resulting **FIRM-SD3.5** model demonstrates significant breakthroughs in fidelity and instruction adherence compared to existing general models by mitigating hallucinations.
31
+
32
+ ## Citation
33
+
34
+ ```bibtex
35
+ @article{zhao2026trust,
36
+ title={Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation},
37
+ author={Zhao, Xiangyu and Zhang, Peiyuan and Lin, Junming and Liang, Tianhao and Duan, Yuchen and Ding, Shengyuan and Tian, Changyao and Zang, Yuhang and Yan, Junchi and Yang, Xue},
38
+ journal={arXiv preprint arXiv:2603.12247},
39
+ year={2026}
40
+ }
41
+ ```
42
+
43
+ ## Acknowledgements
44
+
45
+ This project was developed by the VisionXLab and builds upon several open-source projects including [flow-grpo](https://github.com/yifan123/flow_grpo), [DiffusionNFT](https://github.com/NVlabs/DiffusionNFT), and [Edit-R1](https://github.com/PKU-YuanGroup/Edit-R1).