Devio commited on
Commit
5a2a1a3
Β·
verified Β·
1 Parent(s): 752feae

READ ME updates

Browse files
Files changed (1) hide show
  1. README.md +16 -19
README.md CHANGED
@@ -13,15 +13,10 @@ tags:
13
  ---
14
 
15
  <p align="center">
16
- <img src="assets/logo/OpsAI_Logo.png" width="200"/>
17
  </p>
18
  <p align="center">
19
- <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;<a href="https://huggingface.co/NucleusAI/NucleusMoE-Image">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;<a href="">Tech Report</a>
20
- </p>
21
-
22
- <p align="center">
23
- <img src="assets/collage/Collage-1-Top.jpeg" width="1600"/>
24
- <img src="assets/collage/Collage-1-Bottom.jpeg" width="1600"/>
25
  </p>
26
 
27
  ## Introduction
@@ -37,6 +32,14 @@ tags:
37
  - **Text KV caching via diffusers**: Text tokens are excluded from the transformer backbone entirely and their KV projections are cached across all denoising steps. This caching is natively integrated into the `diffusers` pipeline β€” simply enable it with `TextKVCacheConfig` for automatic speedup with no code changes to the inference loop
38
  - **Progressive resolution training**: Three-stage curriculum (256 β†’ 512 β†’ 1024) with progressive sparsification of expert capacity
39
 
 
 
 
 
 
 
 
 
40
  ## Model Specifications
41
 
42
  | Specification | Value |
@@ -122,28 +125,22 @@ image.save("nucleus_output.png")
122
 
123
  Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β€” from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
124
 
125
- <p align="center">
126
- <img src="assets/collage/Collage-1-Top.jpeg" width="1600"/>
127
- <img src="assets/collage/Collage-1-Bottom.jpeg" width="1600"/>
128
- </p>
129
 
130
  ### Fantasy, Surrealism & Nature
131
 
132
  Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
133
 
134
- <p align="center">
135
- <img src="assets/collage/Collage-2-Top.jpeg" width="1600"/>
136
- <img src="assets/collage/Collage-2-Bottom.jpeg" width="1600"/>
137
- </p>
138
 
139
  ### Commercial & Everyday Imagery
140
 
141
  Nucleus-Image generations across product photography, architecture, typography, food, and world culture β€” demonstrating versatility in commercial, conceptual, and everyday imagery.
142
 
143
- <p align="center">
144
- <img src="assets/collage/Collage-3-Top.jpeg" width="1600"/>
145
- <img src="assets/collage/Collage-3-Bottom.jpeg" width="1600"/>
146
- </p>
147
 
148
  ## License
149
 
 
13
  ---
14
 
15
  <p align="center">
16
+ <img src="assets/logo/nucleus_header.png" width="400"/>
17
  </p>
18
  <p align="center">
19
+ πŸ–₯️ <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;πŸ€— <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image"><b>Hugging Face</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;πŸ“‘ <a href=""><b>Tech Report</b></a>
 
 
 
 
 
20
  </p>
21
 
22
  ## Introduction
 
32
  - **Text KV caching via diffusers**: Text tokens are excluded from the transformer backbone entirely and their KV projections are cached across all denoising steps. This caching is natively integrated into the `diffusers` pipeline β€” simply enable it with `TextKVCacheConfig` for automatic speedup with no code changes to the inference loop
33
  - **Progressive resolution training**: Three-stage curriculum (256 β†’ 512 β†’ 1024) with progressive sparsification of expert capacity
34
 
35
+ ## Architecture
36
+
37
+ ![Architecture](assets/Architecture_Diagram.png)
38
+
39
+ Nucleus-Image is a 32-layer diffusion transformer where 29 of the 32 blocks replace the dense FFN with a sparse MoE layer containing 64 routed experts and one shared expert (the first 3 layers use dense FFN for training stability). Image queries attend to concatenated image and text key-value pairs via joint attention β€” text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
40
+
41
+ Routing uses **Expert-Choice** with a **decoupled design**: the router receives the unmodulated token representation concatenated with the timestep embedding, while expert MLPs receive the fully modulated representation. This prevents the adaptive modulation scale β€” which varies by an order of magnitude across timesteps β€” from collapsing expert selection into timestep-dependent routing, preserving spatial and semantic expert specialization.
42
+
43
  ## Model Specifications
44
 
45
  | Specification | Value |
 
125
 
126
  Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β€” from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
127
 
128
+ ![](assets/collage/Collage-1-Top.jpeg)
129
+ ![](assets/collage/Collage-1-Bottom.jpeg)
 
 
130
 
131
  ### Fantasy, Surrealism & Nature
132
 
133
  Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
134
 
135
+ ![](assets/collage/Collage-2-Top.jpeg)
136
+ ![](assets/collage/Collage-2-Bottom.jpeg)
 
 
137
 
138
  ### Commercial & Everyday Imagery
139
 
140
  Nucleus-Image generations across product photography, architecture, typography, food, and world culture β€” demonstrating versatility in commercial, conceptual, and everyday imagery.
141
 
142
+ ![](assets/collage/Collage-3-Top.jpeg)
143
+ ![](assets/collage/Collage-3-Bottom.jpeg)
 
 
144
 
145
  ## License
146