nielsr HF Staff commited on
Commit
0de124d
·
verified ·
1 Parent(s): 7e157e9

Improve model card: add paper link, metadata, and description

Browse files

Hi! I'm Niels from the Hugging Face community team.

I've improved the model card for this checkpoint. Based on the associated research paper, this model is **FIRM-Gen-8B**, a robust reward model designed to act as a critic for faithful text-to-image generation within the FIRM (Faithful Image Reward Modeling) framework.

Changes include:
- Added the `image-text-to-text` pipeline tag to improve discoverability.
- Linked the model to the research paper, project page, and official GitHub repository.
- Provided a descriptive summary of the model's purpose and its role in reducing hallucinations during reinforcement learning.
- Maintained the existing training hyperparameters and results table.

Files changed (1) hide show
  1. README.md +43 -38
README.md CHANGED
@@ -1,57 +1,57 @@
1
  ---
 
2
  library_name: transformers
3
  license: other
4
- base_model: Qwen/Qwen3-VL-8B-Instruct
5
  tags:
6
  - llama-factory
7
- - full
 
 
8
  - generated_from_trainer
9
  model-index:
10
- - name: gen_reward_sft
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- # gen_reward_sft
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) on the gen_reward_sft dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.5180
22
 
23
- ## Model description
 
 
24
 
25
- More information needed
26
 
27
- ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
- ## Training and evaluation data
32
 
33
- More information needed
34
 
35
- ## Training procedure
36
 
37
- ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
40
- - learning_rate: 1e-05
41
- - train_batch_size: 5
42
- - eval_batch_size: 2
43
- - seed: 42
44
- - distributed_type: multi-GPU
45
- - num_devices: 8
46
- - gradient_accumulation_steps: 2
47
- - total_train_batch_size: 80
48
- - total_eval_batch_size: 16
49
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
- - lr_scheduler_type: cosine
51
- - lr_scheduler_warmup_ratio: 0.1
52
- - num_epochs: 1.0
53
-
54
- ### Training results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss |
57
  |:-------------:|:------:|:----:|:---------------:|
@@ -63,10 +63,15 @@ The following hyperparameters were used during training:
63
  | 0.5155 | 0.8279 | 3000 | 0.5207 |
64
  | 0.5106 | 0.9659 | 3500 | 0.5181 |
65
 
 
66
 
67
- ### Framework versions
68
 
69
- - Transformers 4.57.3
70
- - Pytorch 2.7.1+cu128
71
- - Datasets 4.0.0
72
- - Tokenizers 0.22.2
 
 
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-VL-8B-Instruct
3
  library_name: transformers
4
  license: other
5
+ pipeline_tag: image-text-to-text
6
  tags:
7
  - llama-factory
8
+ - reward-model
9
+ - image-generation
10
+ - reinforcement-learning
11
  - generated_from_trainer
12
  model-index:
13
+ - name: FIRM-Gen-8B (gen_reward_sft)
14
  results: []
15
  ---
16
 
17
+ # FIRM-Gen-8B (gen_reward_sft)
 
 
 
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) and serves as a robust reward model (critic) for text-to-image generation. It was introduced as part of the **FIRM (Faithful Image Reward Modeling)** framework in the paper "[Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation](https://huggingface.co/papers/2603.12247)".
 
 
20
 
21
+ - **Paper:** [Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation](https://huggingface.co/papers/2603.12247)
22
+ - **Project Page:** [firm-reward.github.io](https://firm-reward.github.io/)
23
+ - **Repository:** [VisionXLab/FIRM-Reward](https://github.com/VisionXLab/FIRM-Reward)
24
 
25
+ ## Model Description
26
 
27
+ FIRM-Gen-8B is specifically trained on the **FIRM-Gen-293K** dataset to provide accurate and reliable guidance for faithful image generation. It addresses the common issue of reward hacking and hallucinations in Multimodal Large Language Models (MLLMs) by using a "plan-then-score" pipeline to evaluate how well a generated image follows complex instructions.
28
 
29
+ Within a Reinforcement Learning (RL) pipeline, this model acts as the critic, assigning scores that guide the optimization of generative models (like Stable Diffusion 3.5 or FLUX) toward better instruction adherence and visual fidelity.
30
 
31
+ ## Intended Uses & Limitations
32
 
33
+ This model is intended to be used as a reward signal in RL pipelines or as an evaluation metric for text-to-image alignment. It is compatible with the `transformers` library and can be deployed using the reward server scripts found in the official repository.
34
 
35
+ ## Training Procedure
36
 
37
+ ### Training Hyperparameters
38
 
39
  The following hyperparameters were used during training:
40
+ - **learning_rate:** 1e-05
41
+ - **train_batch_size:** 5
42
+ - **eval_batch_size:** 2
43
+ - **seed:** 42
44
+ - **distributed_type:** multi-GPU
45
+ - **num_devices:** 8
46
+ - **gradient_accumulation_steps:** 2
47
+ - **total_train_batch_size:** 80
48
+ - **total_eval_batch_size:** 16
49
+ - **optimizer:** AdamW
50
+ - **lr_scheduler_type:** cosine
51
+ - **lr_scheduler_warmup_ratio:** 0.1
52
+ - **num_epochs:** 1.0
53
+
54
+ ### Training Results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss |
57
  |:-------------:|:------:|:----:|:---------------:|
 
63
  | 0.5155 | 0.8279 | 3000 | 0.5207 |
64
  | 0.5106 | 0.9659 | 3500 | 0.5181 |
65
 
66
+ ## Citation
67
 
68
+ If you find this model useful, please cite:
69
 
70
+ ```bibtex
71
+ @article{zhao2025trust,
72
+ title={Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation},
73
+ author={Zhao, Xiangyu and Zhang, Peiyuan and Lin, Junming and Liang, Tianhao and Duan, Yuchen and Ding, Shengyuan and Tian, Changyao and Zang, Yuhang and Yan, Junchi and Yang, Xue},
74
+ journal={arXiv preprint arXiv:2603.12247},
75
+ year={2025}
76
+ }
77
+ ```