Update README.md
Browse files
README.md
CHANGED
|
@@ -12,9 +12,7 @@ tags:
|
|
| 12 |
- image-generation
|
| 13 |
---
|
| 14 |
|
| 15 |
-
<p align="center">
|
| 16 |
-
<img src="assets/logo/nucleus_header.png" width="400"/>
|
| 17 |
-
</p>
|
| 18 |
<p align="center">
|
| 19 |
π₯οΈ <a href="https://github.com/WithNucleusAI/Nucleus-Image"><b>GitHub</b></a> | π€ <a href="https://huggingface.co/NucleusAI/NucleusMoE-Image"><b>Hugging Face</b></a> | π <a href=""><b>Tech Report</b></a>
|
| 20 |
</p>
|
|
@@ -34,7 +32,7 @@ tags:
|
|
| 34 |
|
| 35 |
## Architecture
|
| 36 |
|
| 37 |
-
. Image queries attend to concatenated image and text key-value pairs via joint attention β text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
|
| 40 |
|
|
@@ -60,7 +58,7 @@ Routing uses **Expert-Choice** with a **decoupled design**: the router receives
|
|
| 60 |
|
| 61 |
## Benchmark Results
|
| 62 |
|
| 63 |
-

|
|
| 125 |
|
| 126 |
Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
|
| 127 |
|
| 128 |
-

|
| 36 |
|
| 37 |
Nucleus-Image is a 32-layer diffusion transformer where 29 of the 32 blocks replace the dense FFN with a sparse MoE layer containing 64 routed experts and one shared expert (the first 3 layers use dense FFN for training stability). Image queries attend to concatenated image and text key-value pairs via joint attention β text tokens are excluded from the transformer backbone entirely, participating only as KV contributors. This eliminates MoE routing overhead for text and enables full text KV caching across denoising steps.
|
| 38 |
|
|
|
|
| 58 |
|
| 59 |
## Benchmark Results
|
| 60 |
|
| 61 |
+

|
| 62 |
|
| 63 |
Nucleus-Image achieves state-of-the-art or near state-of-the-art results on all three benchmarks despite activating only ~2B of its 17B parameters per forward pass. All results are from the base model at 1024x1024, 50 inference steps, CFG scale 8.0.
|
| 64 |
|
|
|
|
| 123 |
|
| 124 |
Nucleus-Image generations of human subjects and portraits, spanning diverse cultures, ages, and artistic styles β from expressive character studies to fine-grained close-ups with intricate skin texture and detail.
|
| 125 |
|
| 126 |
+

|
| 127 |
+

|
| 128 |
|
| 129 |
### Fantasy, Surrealism & Nature
|
| 130 |
|
| 131 |
Nucleus-Image generations spanning fantasy, surrealism, animation, and the natural world.
|
| 132 |
|
| 133 |
+

|
| 134 |
+

|
| 135 |
|
| 136 |
### Commercial & Everyday Imagery
|
| 137 |
|
| 138 |
Nucleus-Image generations across product photography, architecture, typography, food, and world culture β demonstrating versatility in commercial, conceptual, and everyday imagery.
|
| 139 |
|
| 140 |
+

|
| 141 |
+

|
| 142 |
|
| 143 |
## License
|
| 144 |
|