Add model card

Browse files

Signed-off-by: Gaoyang Zhang <gy@blurgy.xyz>

Files changed (4) hide show

README.md +36 -53
images/laptop-above-dog.jpg +3 -0
images/potted_plant-right-motorcycle.jpg +3 -0
images/sheep-below-sink.jpg +3 -0

README.md CHANGED Viewed

@@ -2,43 +2,41 @@
 tags:
 - text-to-image
 - diffusers
-- template:diffusion-lora
 widget:
-- text: A laptop above a dog
   output:
-    url: images/laptop-above-dog_flux1_compass_004.jpg
-- text: A bird below a skateboard
   output:
-    url: images/flux_compass_bird1.jpg
-- text: A horse to the left of a bottle
   output:
-    url: images/horse-left-bottle_flux1_compass_003.jpg
-base_model: black-forest-labs/FLUX.1-dev
-instance_prompt: null
-license: other
-license_name: compass-lora-weights-nc-license
-license_link: LICENSE
 ---
-# CoMPaSS-FLUX.1
 <Gallery />
 ## Model description
-# CoMPaSS-FLUX.1
-A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image
-diffusion model. This model demonstrates significant improvements in generating images with specific
 spatial relationships between objects.
 ## Model Details
-- **Base Model**: FLUX.1-dev
-- **LoRA Rank**: 16
 - **Training Data**: SCOP dataset (curated from COCO)
-- **File Size**: ~50MiB
 - **Framework**: Diffusers
-- **License**: Non-Commercial (see [./LICENSE])
 ## Intended Use
@@ -50,21 +48,21 @@ spatial relationships between objects.
 ### Key Improvements
-- VISOR benchmark: +98% relative improvement
-- T2I-CompBench Spatial: +67% relative improvement
-- GenEval Position: +131% relative improvement
-- Maintains or improves base model's image fidelity (FID and CMMD scores)
 ## Using the Model
-See our [GitHub repository] to get started.
 ### Effective Prompting
 The model works well with:
 - Clear spatial relationship descriptors (left, right, above, below)
 - Pairs of distinct objects
-- Explicit spatial relationships (e.g., "A to the right of B")
 ## Training Details
@@ -82,29 +80,20 @@ The model works well with:
 ### Training Process
 - Trained for 24,000 steps
-- Batch size of 4
-- Learning rate: 1e-4
 - Optimizer: AdamW with β₁=0.9, β₂=0.999
 - Weight decay: 1e-2
 ## Evaluation Results
-| Metric | Base FLUX.1 | +CoMPaSS | Relative Improvement |
-|--------|-------------|-----------|-------------------|
-| VISOR uncond | 37.96% | 75.17% | +98% |
-| T2I-CompBench Spatial | 0.18 | 0.30 | +67% |
-| GenEval Position | 0.26 | 0.60 | +131% |
-| FID | 27.96 | 26.40 | +5.6% |
-| CMMD | 0.8737 | 0.6859 | +21.5% |
-## Technical Specifications
-- **Architecture**: MMDiT-based FLUX.1 with LoRA adaptation
-- **LoRA Target**: DoubleStreamBlocks
-- **Parameter Count**: Base model parameters + ~50MiB LoRA weights
-- **Input**: Text prompts (like base FLUX.1)
-- **Output**: 1024×1024 images
-- **Compute Requirements**: Similar to base FLUX.1
 ## Citation
@@ -118,11 +107,6 @@ If you use this model in your research, please cite:
 }
 ```
-## Acknowledgments
-This work builds upon the [FLUX.1-dev] model by Black Forest Labs and utilizes the COCO dataset for
-training data curation.
 ## Contact
 For questions about the model, please contact <blurgy@zju.edu.cn>
@@ -131,8 +115,7 @@ For questions about the model, please contact <blurgy@zju.edu.cn>
 Weights for this model are available in Safetensors format.
-[Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.
 [./LICENSE]: <./LICENSE>
-[GitHub repository]: <https://github.com/blurgyy/CoMPaSS>
-[FLUX.1-dev]: <https://huggingface.co/black-forest-labs/FLUX.1-dev>

 tags:
 - text-to-image
 - diffusers
 widget:
+- text: a photo of a laptop above a dog
   output:
+    url: images/laptop-above-dog.jpg
+- text: a photo of a potted plant to the right of a motorcycle
   output:
+    url: images/potted_plant-right-motorcycle.jpg
+- text: a photo of a sheep below a sink
   output:
+    url: images/sheep-below-sink.jpg
+base_model: runwayml/stable-diffusion-v1-5
+license: apache-2.0
 ---
+# CoMPaSS-SD1.5
 <Gallery />
 ## Model description
+# CoMPaSS-SD1.5
+\[[Project Page]\]
+\[[code]\]
+\[[arXiv]\]
+A UNet that enhances spatial understanding capabilities of the StableDiffusion 1.5 text-to-image
+diffusion model.  This model demonstrates significant improvements in generating images with specific
 spatial relationships between objects.
 ## Model Details
+- **Base Model**: StableDiffusion 1.5
 - **Training Data**: SCOP dataset (curated from COCO)
 - **Framework**: Diffusers
+- **License**: Apache-2.0 (see [./LICENSE])
 ## Intended Use
 ### Key Improvements
+- VISOR benchmark: +249.6% relative improvement
+- T2I-CompBench Spatial: +337.5% relative improvement
+- GenEval Position: +1250.0% relative improvement
+- Maintains or improves base model's image fidelity (lower FID and CMMD scores than base model)
 ## Using the Model
+See our [GitHub repository][code] to get started.
 ### Effective Prompting
 The model works well with:
 - Clear spatial relationship descriptors (left, right, above, below)
 - Pairs of distinct objects
+- Explicit spatial relationships (e.g., "a photo of A to the right of B")
 ## Training Details
 ### Training Process
 - Trained for 24,000 steps
+- Effective batch size of 4
+- Learning rate: 5e-6
 - Optimizer: AdamW with β₁=0.9, β₂=0.999
 - Weight decay: 1e-2
 ## Evaluation Results
+| Metric | StableDiffusion 1.4 | +CoMPaSS |
+|--------|-------------|-----------|
+| VISOR uncond (⬆️) | 17.58% | **61.46%** |
+| T2I-CompBench Spatial (⬆️) | 0.08 | **0.35** |
+| GenEval Position (⬆️) | 0.04 | **0.54** |
+| FID (⬇️) | 12.82 | **10.89** |
+| CMMD (⬇️) | 0.5548 | **0.3235** |
 ## Citation
 }
 ```
 ## Contact
 For questions about the model, please contact <blurgy@zju.edu.cn>
 Weights for this model are available in Safetensors format.
 [./LICENSE]: <./LICENSE>
+[Project page]: <https://compass.blurgy.xyz>
+[code]: <https://github.com/blurgyy/CoMPaSS>
+[arXiv]: <https://arxiv.org/abs/2412.13195>

images/laptop-above-dog.jpg ADDED Viewed

Git LFS Details

SHA256: 87976de4359c69bda117456984a870f378e229fb3a2d80f5de28583cf5d32956
Pointer size: 130 Bytes
Size of remote file: 46.2 kB

images/potted_plant-right-motorcycle.jpg ADDED Viewed

Git LFS Details

SHA256: c7143303d023cd96609be820d3cc374fbcde17b036cfe37f6ff53194599d9aae
Pointer size: 130 Bytes
Size of remote file: 57.3 kB

images/sheep-below-sink.jpg ADDED Viewed

Git LFS Details

SHA256: 8d8f8abe564da702fe5518f142b3054aea57bac728d7c716c0d9b4870568c4c1
Pointer size: 130 Bytes
Size of remote file: 36.1 kB