prithivMLmods commited on
Commit
9c62817
·
verified ·
1 Parent(s): 105ff12

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -9,4 +9,112 @@ library_name: diffusers
9
  tags:
10
  - text-generation-inference
11
  - image-edit
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  tags:
10
  - text-generation-inference
11
  - image-edit
12
+ ---
13
+
14
+ # **FireRed-Image-Edit-1.0-fp8**
15
+
16
+ > **FireRed-Image-Edit-1.0-fp8** is an FP8-compressed transformer variant built on top of **FireRedTeam/FireRed-Image-Edit-1.0**.
17
+ > This release provides **Transformers-only compressed weights** and **Diffusers-compatible transformer weights**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
18
+
19
+ > [!important]
20
+ > This release compresses **only the diffusion transformer module** using **BF16 · FP8 (F8_E4M3)** precision. The VAE and other components remain unchanged from the base model. FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs –
21
+ FP8 W8A8: [https://docs.vllm.ai/en/stable/features/quantization/fp8/](https://docs.vllm.ai/en/stable/features/quantization/fp8/)
22
+ Quantization recipe: [https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8)
23
+
24
+ ## Diffusers Usage
25
+
26
+ ```python
27
+ import torch
28
+ from diffusers.models import QwenImageTransformer2DModel
29
+ from diffusers import QwenImageEditPlusPipeline
30
+ from diffusers.utils import load_image
31
+
32
+ transformer = QwenImageTransformer2DModel.from_pretrained(
33
+ "prithivMLmods/FireRed-Image-Edit-1.0-fp8",
34
+ subfolder="transformer",
35
+ torch_dtype=torch.bfloat16
36
+ )
37
+
38
+ pipeline = QwenImageEditPlusPipeline.from_pretrained(
39
+ "FireRedTeam/FireRed-Image-Edit-1.0",
40
+ transformer=transformer,
41
+ torch_dtype=torch.bfloat16
42
+ )
43
+
44
+ pipeline.to("cuda")
45
+
46
+ image1 = load_image("grumpycat.png")
47
+ prompt = "turn the cat into an orange cat"
48
+
49
+ inputs = {
50
+ "image": [image1],
51
+ "prompt": prompt,
52
+ "generator": torch.manual_seed(42),
53
+ "true_cfg_scale": 1.0,
54
+ "negative_prompt": " ",
55
+ "num_inference_steps": 40,
56
+ "guidance_scale": 1.0,
57
+ "num_images_per_prompt": 1,
58
+ }
59
+
60
+ output = pipeline(**inputs)
61
+ output_image = output.images[0]
62
+ output_image.save("output_image_edit_plus.png")
63
+ ```
64
+
65
+ ## About the Base Model
66
+
67
+ **FireRed-Image-Edit-1.0** from FireRedTeam is a state-of-the-art open-source diffusion transformer designed for instruction-based image editing.
68
+
69
+ It achieves top-tier performance through:
70
+
71
+ * A **1.6B-sample dataset**, refined to **100M+ high-quality text-to-image and editing pairs**
72
+ * Cleaning, stratification, auto-labeling
73
+ * Dual-stage filtering for optimal semantic coverage and instruction alignment
74
+
75
+ ### Multi-Stage Training Pipeline
76
+
77
+ 1. Pre-training
78
+ 2. Supervised fine-tuning
79
+ 3. Reinforcement learning
80
+
81
+ ### Key Innovations
82
+
83
+ * **Multi-Condition Aware Bucket Sampler** for efficient variable-resolution batching
84
+ * **Stochastic Instruction Alignment** with dynamic prompt re-indexing
85
+ * **Asymmetric Gradient Optimization** for stable DPO
86
+ * **DiffusionNFT** with layout-aware OCR rewards for precise text editing
87
+ * **Differentiable Consistency Loss** for identity preservation
88
+
89
+ ## Native Capabilities
90
+
91
+ * Photo restoration
92
+ * Multi-image editing such as virtual try-on
93
+ * Style transfer with text fidelity
94
+ * Complex instruction adherence
95
+ * Layout-aware text editing
96
+ * Identity-preserving edits
97
+ * Professional photorealistic refinements
98
+
99
+ * Skin texture realism
100
+ * Multi-outfit changes in single passes
101
+
102
+ It achieves strong results across:
103
+
104
+ * REDEdit-Bench with 15 editing categories
105
+ * ImgEdit
106
+ * GEdit
107
+
108
+ The model supports native editing from text-to-image foundations rather than patch-based methods, enabling coherent, high-quality outputs suitable for professional workflows and ComfyUI integration.
109
+
110
+ ## What FP8 Adds
111
+
112
+ The **FireRed-Image-Edit-1.0-fp8** variant introduces:
113
+
114
+ * **BF16 · FP8 (F8_E4M3) Transformer Compression**
115
+ * Reduced VRAM usage
116
+ * Improved throughput
117
+ * Faster inference on Hopper and compatible GPUs
118
+ * Production-friendly deployment without modifying the original pipeline structure
119
+
120
+ > Only the transformer weights are compressed, ensuring seamless compatibility with existing Diffusers pipelines.