READ ME updates
Browse files
README.md
CHANGED
|
@@ -13,15 +13,10 @@ tags:
|
|
| 13 |
---
|
| 14 |
|
| 15 |
<p align="center">
|
| 16 |
-
<img src="assets/logo/
|
| 17 |
</p>
|
| 18 |
<p align="center">
|
| 19 |
-
<a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a> | <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image">Hugging Face</a> | <a href="">Tech Report</a>
|
| 20 |
-
</p>
|
| 21 |
-
|
| 22 |
-
<p align="center">
|
| 23 |
-
<img src="assets/collage/Collage-1-Top.jpeg" width="1600"/>
|
| 24 |
-
<img src="assets/collage/Collage-1-Bottom.jpeg" width="1600"/>
|
| 25 |
</p>
|
| 26 |
|
| 27 |
## Introduction
|
|
@@ -37,6 +32,14 @@ tags:
|
|
| 37 |
- **Text KV caching via diffusers**: Text tokens are excluded from the transformer backbone entirely and their KV projections are cached across all denoising steps. This caching is natively integrated into the `diffusers` pipeline β simply enable it with `TextKVCacheConfig` for automatic speedup with no code changes to the inference loop
|
| 38 |
- **Progressive resolution training**: Three-stage curriculum (256 β 512 β 1024) with progressive sparsification of expert capacity
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
## Model Specifications
|
| 41 |
|
| 42 |
| Specification | Value |
|
|
@@ -122,28 +125,22 @@ image.save("nucleus_output.png")
|
|
| 122 |
|
| 123 |
Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
|
| 124 |
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
<img src="assets/collage/Collage-1-Bottom.jpeg" width="1600"/>
|
| 128 |
-
</p>
|
| 129 |
|
| 130 |
### Fantasy, Surrealism & Nature
|
| 131 |
|
| 132 |
Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
<img src="assets/collage/Collage-2-Bottom.jpeg" width="1600"/>
|
| 137 |
-
</p>
|
| 138 |
|
| 139 |
### Commercial & Everyday Imagery
|
| 140 |
|
| 141 |
Nucleus-Image generations across product photography, architecture, typography, food, and world culture β demonstrating versatility in commercial, conceptual, and everyday imagery.
|
| 142 |
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
<img src="assets/collage/Collage-3-Bottom.jpeg" width="1600"/>
|
| 146 |
-
</p>
|
| 147 |
|
| 148 |
## License
|
| 149 |
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
<p align="center">
|
| 16 |
+
<img src="assets/logo/nucleus_header.png" width="400"/>
|
| 17 |
</p>
|
| 18 |
<p align="center">
|
| 19 |
+
π₯οΈ <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a> | π€ <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image"><b>Hugging Face</b></a> | π <a href=""><b>Tech Report</b></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
</p>
|
| 21 |
|
| 22 |
## Introduction
|
|
|
|
| 32 |
- **Text KV caching via diffusers**: Text tokens are excluded from the transformer backbone entirely and their KV projections are cached across all denoising steps. This caching is natively integrated into the `diffusers` pipeline β simply enable it with `TextKVCacheConfig` for automatic speedup with no code changes to the inference loop
|
| 33 |
- **Progressive resolution training**: Three-stage curriculum (256 β 512 β 1024) with progressive sparsification of expert capacity
|
| 34 |
|
| 35 |
+
## Architecture
|
| 36 |
+
|
| 37 |
+

|
| 38 |
+
|
| 39 |
+
Nucleus-Image is a 32-layer diffusion transformer where 29 of the 32 blocks replace the dense FFN with a sparse MoE layer containing 64 routed experts and one shared expert (the first 3 layers use dense FFN for training stability). Image queries attend to concatenated image and text key-value pairs via joint attention β text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
|
| 40 |
+
|
| 41 |
+
Routing uses **Expert-Choice** with a **decoupled design**: the router receives the unmodulated token representation concatenated with the timestep embedding, while expert MLPs receive the fully modulated representation. This prevents the adaptive modulation scale β which varies by an order of magnitude across timesteps β from collapsing expert selection into timestep-dependent routing, preserving spatial and semantic expert specialization.
|
| 42 |
+
|
| 43 |
## Model Specifications
|
| 44 |
|
| 45 |
| Specification | Value |
|
|
|
|
| 125 |
|
| 126 |
Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
|
| 127 |
|
| 128 |
+

|
| 129 |
+

|
|
|
|
|
|
|
| 130 |
|
| 131 |
### Fantasy, Surrealism & Nature
|
| 132 |
|
| 133 |
Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
|
| 134 |
|
| 135 |
+

|
| 136 |
+

|
|
|
|
|
|
|
| 137 |
|
| 138 |
### Commercial & Everyday Imagery
|
| 139 |
|
| 140 |
Nucleus-Image generations across product photography, architecture, typography, food, and world culture β demonstrating versatility in commercial, conceptual, and everyday imagery.
|
| 141 |
|
| 142 |
+

|
| 143 |
+

|
|
|
|
|
|
|
| 144 |
|
| 145 |
## License
|
| 146 |
|