mindfulmandy commited on
Commit
50a96d6
Β·
verified Β·
1 Parent(s): 5a2a1a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -11
README.md CHANGED
@@ -12,9 +12,7 @@ tags:
12
  - image-generation
13
  ---
14
 
15
- <p align="center">
16
- <img src="assets/logo/nucleus_header.png" width="400"/>
17
- </p>
18
  <p align="center">
19
  πŸ–₯️ <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;πŸ€— <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image"><b>Hugging Face</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;πŸ“‘ <a href=""><b>Tech Report</b></a>
20
  </p>
@@ -34,7 +32,7 @@ tags:
34
 
35
  ## Architecture
36
 
37
- ![Architecture](assets/Architecture_Diagram.png)
38
 
39
  Nucleus-Image is a 32-layer diffusion transformer where 29 of the 32 blocks replace the dense FFN with a sparse MoE layer containing 64 routed experts and one shared expert (the first 3 layers use dense FFN for training stability). Image queries attend to concatenated image and text key-value pairs via joint attention β€” text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
40
 
@@ -60,7 +58,7 @@ Routing uses **Expert-Choice** with a **decoupled design**: the router receives
60
 
61
  ## Benchmark Results
62
 
63
- ![Overall Performance](assets/Overall-Performance.png)
64
 
65
  Nucleus-Image achieves state-of-the-art or near state-of-the-art results on all three benchmarks despite activating only ~2B of its 17B parameters per forward pass. All results are from the base model at 1024x1024, 50 inference steps, CFG scale 8.0.
66
 
@@ -125,22 +123,22 @@ image.save("nucleus_output.png")
125
 
126
  Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β€” from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
127
 
128
- ![](assets/collage/Collage-1-Top.jpeg)
129
- ![](assets/collage/Collage-1-Bottom.jpeg)
130
 
131
  ### Fantasy, Surrealism & Nature
132
 
133
  Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
134
 
135
- ![](assets/collage/Collage-2-Top.jpeg)
136
- ![](assets/collage/Collage-2-Bottom.jpeg)
137
 
138
  ### Commercial & Everyday Imagery
139
 
140
  Nucleus-Image generations across product photography, architecture, typography, food, and world culture β€” demonstrating versatility in commercial, conceptual, and everyday imagery.
141
 
142
- ![](assets/collage/Collage-3-Top.jpeg)
143
- ![](assets/collage/Collage-3-Bottom.jpeg)
144
 
145
  ## License
146
 
 
12
  - image-generation
13
  ---
14
 
15
+ <p align="center"> <img src="https://storage.googleapis.com/nucleus_image_v1/nucleus_header.png" width="400"/></p>
 
 
16
  <p align="center">
17
  πŸ–₯️ <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;πŸ€— <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image"><b>Hugging Face</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;πŸ“‘ <a href=""><b>Tech Report</b></a>
18
  </p>
 
32
 
33
  ## Architecture
34
 
35
+ ![Architecture](https://storage.googleapis.com/nucleus_image_v1/Architecture_Diagram.png)
36
 
37
  Nucleus-Image is a 32-layer diffusion transformer where 29 of the 32 blocks replace the dense FFN with a sparse MoE layer containing 64 routed experts and one shared expert (the first 3 layers use dense FFN for training stability). Image queries attend to concatenated image and text key-value pairs via joint attention β€” text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
38
 
 
58
 
59
  ## Benchmark Results
60
 
61
+ ![Overall Performance](https://storage.googleapis.com/nucleus_image_v1/Overall-Performance.png)
62
 
63
  Nucleus-Image achieves state-of-the-art or near state-of-the-art results on all three benchmarks despite activating only ~2B of its 17B parameters per forward pass. All results are from the base model at 1024x1024, 50 inference steps, CFG scale 8.0.
64
 
 
123
 
124
  Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β€” from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
125
 
126
+ ![](https://storage.googleapis.com/nucleus_image_v1/Collage-1-Top.jpeg)
127
+ ![](https://storage.googleapis.com/nucleus_image_v1/Collage-1-Bottom.jpeg)
128
 
129
  ### Fantasy, Surrealism & Nature
130
 
131
  Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
132
 
133
+ ![](https://storage.googleapis.com/nucleus_image_v1/Collage-2-Top.jpeg)
134
+ ![](https://storage.googleapis.com/nucleus_image_v1/Collage-2-Bottom.jpeg)
135
 
136
  ### Commercial & Everyday Imagery
137
 
138
  Nucleus-Image generations across product photography, architecture, typography, food, and world culture β€” demonstrating versatility in commercial, conceptual, and everyday imagery.
139
 
140
+ ![](https://storage.googleapis.com/nucleus_image_v1/Collage-3-Top.jpeg)
141
+ ![](https://storage.googleapis.com/nucleus_image_v1/Collage-3-Bottom.jpeg)
142
 
143
  ## License
144