Gaoyang Zhang commited on
Commit
3568b5a
·
unverified ·
1 Parent(s): 0862dd7

Add model card

Browse files

Signed-off-by: Gaoyang Zhang <gy@blurgy.xyz>

README.md CHANGED
@@ -2,43 +2,41 @@
2
  tags:
3
  - text-to-image
4
  - diffusers
5
- - template:diffusion-lora
6
  widget:
7
- - text: A laptop above a dog
8
  output:
9
- url: images/laptop-above-dog_flux1_compass_004.jpg
10
- - text: A bird below a skateboard
11
  output:
12
- url: images/flux_compass_bird1.jpg
13
- - text: A horse to the left of a bottle
14
  output:
15
- url: images/horse-left-bottle_flux1_compass_003.jpg
16
- base_model: black-forest-labs/FLUX.1-dev
17
- instance_prompt: null
18
- license: other
19
- license_name: compass-lora-weights-nc-license
20
- license_link: LICENSE
21
  ---
22
- # CoMPaSS-FLUX.1
23
 
24
  <Gallery />
25
 
26
  ## Model description
27
 
28
- # CoMPaSS-FLUX.1
29
 
30
- A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image
31
- diffusion model. This model demonstrates significant improvements in generating images with specific
 
 
 
 
32
  spatial relationships between objects.
33
 
34
  ## Model Details
35
 
36
- - **Base Model**: FLUX.1-dev
37
- - **LoRA Rank**: 16
38
  - **Training Data**: SCOP dataset (curated from COCO)
39
- - **File Size**: ~50MiB
40
  - **Framework**: Diffusers
41
- - **License**: Non-Commercial (see [./LICENSE])
42
 
43
  ## Intended Use
44
 
@@ -50,21 +48,21 @@ spatial relationships between objects.
50
 
51
  ### Key Improvements
52
 
53
- - VISOR benchmark: +98% relative improvement
54
- - T2I-CompBench Spatial: +67% relative improvement
55
- - GenEval Position: +131% relative improvement
56
- - Maintains or improves base model's image fidelity (FID and CMMD scores)
57
 
58
  ## Using the Model
59
 
60
- See our [GitHub repository] to get started.
61
 
62
  ### Effective Prompting
63
 
64
  The model works well with:
65
  - Clear spatial relationship descriptors (left, right, above, below)
66
  - Pairs of distinct objects
67
- - Explicit spatial relationships (e.g., "A to the right of B")
68
 
69
  ## Training Details
70
 
@@ -82,29 +80,20 @@ The model works well with:
82
  ### Training Process
83
 
84
  - Trained for 24,000 steps
85
- - Batch size of 4
86
- - Learning rate: 1e-4
87
  - Optimizer: AdamW with β₁=0.9, β₂=0.999
88
  - Weight decay: 1e-2
89
 
90
  ## Evaluation Results
91
 
92
- | Metric | Base FLUX.1 | +CoMPaSS | Relative Improvement |
93
- |--------|-------------|-----------|-------------------|
94
- | VISOR uncond | 37.96% | 75.17% | +98% |
95
- | T2I-CompBench Spatial | 0.18 | 0.30 | +67% |
96
- | GenEval Position | 0.26 | 0.60 | +131% |
97
- | FID | 27.96 | 26.40 | +5.6% |
98
- | CMMD | 0.8737 | 0.6859 | +21.5% |
99
-
100
- ## Technical Specifications
101
-
102
- - **Architecture**: MMDiT-based FLUX.1 with LoRA adaptation
103
- - **LoRA Target**: DoubleStreamBlocks
104
- - **Parameter Count**: Base model parameters + ~50MiB LoRA weights
105
- - **Input**: Text prompts (like base FLUX.1)
106
- - **Output**: 1024×1024 images
107
- - **Compute Requirements**: Similar to base FLUX.1
108
 
109
  ## Citation
110
 
@@ -118,11 +107,6 @@ If you use this model in your research, please cite:
118
  }
119
  ```
120
 
121
- ## Acknowledgments
122
-
123
- This work builds upon the [FLUX.1-dev] model by Black Forest Labs and utilizes the COCO dataset for
124
- training data curation.
125
-
126
  ## Contact
127
 
128
  For questions about the model, please contact <blurgy@zju.edu.cn>
@@ -131,8 +115,7 @@ For questions about the model, please contact <blurgy@zju.edu.cn>
131
 
132
  Weights for this model are available in Safetensors format.
133
 
134
- [Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.
135
-
136
  [./LICENSE]: <./LICENSE>
137
- [GitHub repository]: <https://github.com/blurgyy/CoMPaSS>
138
- [FLUX.1-dev]: <https://huggingface.co/black-forest-labs/FLUX.1-dev>
 
 
2
  tags:
3
  - text-to-image
4
  - diffusers
 
5
  widget:
6
+ - text: a photo of a laptop above a dog
7
  output:
8
+ url: images/laptop-above-dog.jpg
9
+ - text: a photo of a potted plant to the right of a motorcycle
10
  output:
11
+ url: images/potted_plant-right-motorcycle.jpg
12
+ - text: a photo of a sheep below a sink
13
  output:
14
+ url: images/sheep-below-sink.jpg
15
+ base_model: runwayml/stable-diffusion-v1-5
16
+ license: apache-2.0
 
 
 
17
  ---
18
+ # CoMPaSS-SD1.5
19
 
20
  <Gallery />
21
 
22
  ## Model description
23
 
24
+ # CoMPaSS-SD1.5
25
 
26
+ \[[Project Page]\]
27
+ \[[code]\]
28
+ \[[arXiv]\]
29
+
30
+ A UNet that enhances spatial understanding capabilities of the StableDiffusion 1.5 text-to-image
31
+ diffusion model. This model demonstrates significant improvements in generating images with specific
32
  spatial relationships between objects.
33
 
34
  ## Model Details
35
 
36
+ - **Base Model**: StableDiffusion 1.5
 
37
  - **Training Data**: SCOP dataset (curated from COCO)
 
38
  - **Framework**: Diffusers
39
+ - **License**: Apache-2.0 (see [./LICENSE])
40
 
41
  ## Intended Use
42
 
 
48
 
49
  ### Key Improvements
50
 
51
+ - VISOR benchmark: +249.6% relative improvement
52
+ - T2I-CompBench Spatial: +337.5% relative improvement
53
+ - GenEval Position: +1250.0% relative improvement
54
+ - Maintains or improves base model's image fidelity (lower FID and CMMD scores than base model)
55
 
56
  ## Using the Model
57
 
58
+ See our [GitHub repository][code] to get started.
59
 
60
  ### Effective Prompting
61
 
62
  The model works well with:
63
  - Clear spatial relationship descriptors (left, right, above, below)
64
  - Pairs of distinct objects
65
+ - Explicit spatial relationships (e.g., "a photo of A to the right of B")
66
 
67
  ## Training Details
68
 
 
80
  ### Training Process
81
 
82
  - Trained for 24,000 steps
83
+ - Effective batch size of 4
84
+ - Learning rate: 5e-6
85
  - Optimizer: AdamW with β₁=0.9, β₂=0.999
86
  - Weight decay: 1e-2
87
 
88
  ## Evaluation Results
89
 
90
+ | Metric | StableDiffusion 1.4 | +CoMPaSS |
91
+ |--------|-------------|-----------|
92
+ | VISOR uncond (⬆️) | 17.58% | **61.46%** |
93
+ | T2I-CompBench Spatial (⬆️) | 0.08 | **0.35** |
94
+ | GenEval Position (⬆️) | 0.04 | **0.54** |
95
+ | FID (⬇️) | 12.82 | **10.89** |
96
+ | CMMD (⬇️) | 0.5548 | **0.3235** |
 
 
 
 
 
 
 
 
 
97
 
98
  ## Citation
99
 
 
107
  }
108
  ```
109
 
 
 
 
 
 
110
  ## Contact
111
 
112
  For questions about the model, please contact <blurgy@zju.edu.cn>
 
115
 
116
  Weights for this model are available in Safetensors format.
117
 
 
 
118
  [./LICENSE]: <./LICENSE>
119
+ [Project page]: <https://compass.blurgy.xyz>
120
+ [code]: <https://github.com/blurgyy/CoMPaSS>
121
+ [arXiv]: <https://arxiv.org/abs/2412.13195>
images/laptop-above-dog.jpg ADDED

Git LFS Details

  • SHA256: 87976de4359c69bda117456984a870f378e229fb3a2d80f5de28583cf5d32956
  • Pointer size: 130 Bytes
  • Size of remote file: 46.2 kB
images/potted_plant-right-motorcycle.jpg ADDED

Git LFS Details

  • SHA256: c7143303d023cd96609be820d3cc374fbcde17b036cfe37f6ff53194599d9aae
  • Pointer size: 130 Bytes
  • Size of remote file: 57.3 kB
images/sheep-below-sink.jpg ADDED

Git LFS Details

  • SHA256: 8d8f8abe564da702fe5518f142b3054aea57bac728d7c716c0d9b4870568c4c1
  • Pointer size: 130 Bytes
  • Size of remote file: 36.1 kB