NucleusAI
/

Nucleus-Image

@@ -13,15 +13,10 @@ tags:
 ---
 <p align="center">
-    <img src="assets/logo/OpsAI_Logo.png" width="200"/>
 </p>
 <p align="center">
-    <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;<a href="https://huggingface.co/NucleusAI/NucleusMoE-Image">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;<a href="">Tech Report</a>
-</p>
-<p align="center">
-    <img src="assets/collage/Collage-1-Top.jpeg" width="1600"/>
-    <img src="assets/collage/Collage-1-Bottom.jpeg" width="1600"/>
 </p>
 ## Introduction
@@ -37,6 +32,14 @@ tags:
 - **Text KV caching via diffusers**: Text tokens are excluded from the transformer backbone entirely and their KV projections are cached across all denoising steps. This caching is natively integrated into the `diffusers` pipeline — simply enable it with `TextKVCacheConfig` for automatic speedup with no code changes to the inference loop
 - **Progressive resolution training**: Three-stage curriculum (256 → 512 → 1024) with progressive sparsification of expert capacity
 ## Model Specifications
 | Specification | Value |
@@ -122,28 +125,22 @@ image.save("nucleus_output.png")
 Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles — from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
-<p align="center">
-    <img src="assets/collage/Collage-1-Top.jpeg" width="1600"/>
-    <img src="assets/collage/Collage-1-Bottom.jpeg" width="1600"/>
-</p>
 ### Fantasy, Surrealism & Nature
 Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
-<p align="center">
-    <img src="assets/collage/Collage-2-Top.jpeg" width="1600"/>
-    <img src="assets/collage/Collage-2-Bottom.jpeg" width="1600"/>
-</p>
 ### Commercial & Everyday Imagery
 Nucleus-Image generations across product photography, architecture, typography, food, and world culture — demonstrating versatility in commercial, conceptual, and everyday imagery.
-<p align="center">
-    <img src="assets/collage/Collage-3-Top.jpeg" width="1600"/>
-    <img src="assets/collage/Collage-3-Bottom.jpeg" width="1600"/>
-</p>
 ## License

 ---
 <p align="center">
+    <img src="assets/logo/nucleus_header.png" width="400"/>
 </p>
 <p align="center">
+    🖥️ <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;🤗 <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image"><b>Hugging Face</b></a>&nbsp;&nbsp; | &nbsp;&nbsp;📑 <a href=""><b>Tech Report</b></a>
 </p>
 ## Introduction
 - **Text KV caching via diffusers**: Text tokens are excluded from the transformer backbone entirely and their KV projections are cached across all denoising steps. This caching is natively integrated into the `diffusers` pipeline — simply enable it with `TextKVCacheConfig` for automatic speedup with no code changes to the inference loop
 - **Progressive resolution training**: Three-stage curriculum (256 → 512 → 1024) with progressive sparsification of expert capacity
+## Architecture
+![Architecture](assets/Architecture_Diagram.png)
+Nucleus-Image is a 32-layer diffusion transformer where 29 of the 32 blocks replace the dense FFN with a sparse MoE layer containing 64 routed experts and one shared expert (the first 3 layers use dense FFN for training stability). Image queries attend to concatenated image and text key-value pairs via joint attention — text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
+Routing uses **Expert-Choice** with a **decoupled design**: the router receives the unmodulated token representation concatenated with the timestep embedding, while expert MLPs receive the fully modulated representation. This prevents the adaptive modulation scale — which varies by an order of magnitude across timesteps — from collapsing expert selection into timestep-dependent routing, preserving spatial and semantic expert specialization.
 ## Model Specifications
 | Specification | Value |
 Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles — from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
+![](assets/collage/Collage-1-Top.jpeg)
+![](assets/collage/Collage-1-Bottom.jpeg)
 ### Fantasy, Surrealism & Nature
 Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
+![](assets/collage/Collage-2-Top.jpeg)
+![](assets/collage/Collage-2-Bottom.jpeg)
 ### Commercial & Everyday Imagery
 Nucleus-Image generations across product photography, architecture, typography, food, and world culture — demonstrating versatility in commercial, conceptual, and everyday imagery.
+![](assets/collage/Collage-3-Top.jpeg)
+![](assets/collage/Collage-3-Bottom.jpeg)
 ## License