Improve model card for FIRM-Edit-8B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +38 -26
README.md CHANGED
@@ -1,39 +1,42 @@
1
  ---
 
2
  library_name: transformers
3
  license: other
4
- base_model: Qwen/Qwen3-VL-8B-Instruct
5
  tags:
 
 
 
6
  - llama-factory
7
- - full
8
  - generated_from_trainer
9
  model-index:
10
- - name: edit_evaluation_sft_202602030104
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # edit_evaluation_sft_202602030104
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) on the instruction_following_train_v3 and the consistency_train_v3 datasets.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.5041
22
 
23
- ## Model description
24
 
25
- More information needed
 
 
26
 
27
- ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
- ## Training and evaluation data
32
-
33
- More information needed
34
 
35
  ## Training procedure
36
 
 
 
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
@@ -41,12 +44,7 @@ The following hyperparameters were used during training:
41
  - train_batch_size: 10
42
  - eval_batch_size: 2
43
  - seed: 42
44
- - distributed_type: multi-GPU
45
- - num_devices: 8
46
  - gradient_accumulation_steps: 2
47
- - total_train_batch_size: 160
48
- - total_eval_batch_size: 16
49
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.1
52
  - num_epochs: 1.0
@@ -60,10 +58,24 @@ The following hyperparameters were used during training:
60
  | 0.5252 | 0.6546 | 1500 | 0.5199 |
61
  | 0.5075 | 0.8728 | 2000 | 0.5055 |
62
 
 
 
 
 
 
 
 
 
 
 
63
 
64
- ### Framework versions
65
 
66
- - Transformers 4.57.3
67
- - Pytorch 2.7.1+cu128
68
- - Datasets 4.0.0
69
- - Tokenizers 0.22.2
 
 
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-VL-8B-Instruct
3
  library_name: transformers
4
  license: other
5
+ pipeline_tag: image-text-to-text
6
  tags:
7
+ - reward-model
8
+ - image-editing
9
+ - FIRM
10
  - llama-factory
 
11
  - generated_from_trainer
12
  model-index:
13
+ - name: FIRM-Edit-8B
14
  results: []
15
  ---
16
 
17
+ # FIRM-Edit-8B
 
18
 
19
+ [**Project Page**](https://firm-reward.github.io/) | [**Paper**](https://arxiv.org/abs/2603.12247) | [**GitHub**](https://github.com/VisionXLab/FIRM-Reward)
20
 
21
+ **FIRM-Edit-8B** is a robust reward model (critic) designed for faithful image editing. It is a fine-tuned version of [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) on the **FIRM-Edit-370K** dataset. The model is part of the **FIRM (Faithful Image Reward Modeling)** framework, which provides accurate and reliable guidance for visual reinforcement learning pipelines.
 
 
22
 
23
+ ## Model Description
24
 
25
+ Conventional reward models used for image editing often suffer from hallucinations and assign noisy scores, misguiding the optimization process. FIRM-Edit-8B addresses these issues by evaluating edits through two competing objectives:
26
+ 1. **Execution**: Adherence to the editing instruction.
27
+ 2. **Consistency**: Preservation of original content in unedited regions.
28
 
29
+ By formulating a "Consistency-Modulated Execution" (CME) reward strategy, this model acts as a stable critic that mitigates hallucinations and helps establish a new standard for fidelity in image editing.
30
 
31
+ ## Intended Uses & Limitations
32
 
33
+ - **Reward Modeling**: To be used as a reward signal in Reinforcement Learning (RL) pipelines for image editing.
34
+ - **Evaluation**: To serve as a metric for benchmarking the performance of image editing models.
 
35
 
36
  ## Training procedure
37
 
38
+ The model was fine-tuned using the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) framework.
39
+
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
 
44
  - train_batch_size: 10
45
  - eval_batch_size: 2
46
  - seed: 42
 
 
47
  - gradient_accumulation_steps: 2
 
 
 
48
  - lr_scheduler_type: cosine
49
  - lr_scheduler_warmup_ratio: 0.1
50
  - num_epochs: 1.0
 
58
  | 0.5252 | 0.6546 | 1500 | 0.5199 |
59
  | 0.5075 | 0.8728 | 2000 | 0.5055 |
60
 
61
+ ## Usage
62
+
63
+ To use the model as a reward server for RL training, you can use the script provided in the official repository:
64
+
65
+ ```bash
66
+ # Launch the reward server
67
+ python editing/reward_server/reward_server_qwen3_vl_8b_sft.py
68
+ ```
69
+
70
+ ## Citation
71
 
72
+ If you find this work useful, please cite:
73
 
74
+ ```bibtex
75
+ @article{zhao2026trust,
76
+ title={Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation},
77
+ author={Zhao, Xiangyu and Zhang, Peiyuan and Lin, Junming and Liang, Tianhao and Duan, Yuchen and Ding, Shengyuan and Tian, Changyao and Zang, Yuhang and Yan, Junchi and Yang, Xue},
78
+ journal={arXiv preprint arXiv:2603.12247},
79
+ year={2026}
80
+ }
81
+ ```