nbagel commited on
Commit
f5830f9
·
verified ·
1 Parent(s): 4dec1ca

Update README with full model card, distributed inference roadmap, and expert visualization

Browse files
Files changed (1) hide show
  1. README.md +286 -84
README.md CHANGED
@@ -1,75 +1,88 @@
1
- # 🥖 Baguette - Paris MoE Text-to-Image
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- A ~5 billion parameter Mixture-of-Experts diffusion model with 8 specialized experts.
4
 
5
  ## ⚡ Quick Start
6
 
7
  ```bash
 
 
 
 
8
  # Install dependencies
9
  pip install uv && uv pip install torch torchvision safetensors transformers diffusers accelerate tqdm
10
 
11
- # Generate 4 cat images
12
  python generate.py --prompt "a cute cat" --num_samples 4
13
  ```
14
 
15
- That's it! Images saved to `output_bf16.png`.
16
 
17
  ---
18
 
19
- ## 🎨 Examples
20
 
21
  ```bash
22
- # Simple generation
23
- python generate.py --prompt "sunset over mountains"
24
 
25
- # More samples, see expert routing
26
- python generate.py --prompt "abstract art" --num_samples 16 --visualize
27
 
28
- # Faster with fewer steps
29
- python generate.py --prompt "a dog" --num_steps 15
30
 
31
- # Lower memory (offload 4 experts to CPU)
32
- python generate.py --prompt "portrait" --offload 4
33
 
34
- # INT8 weights (smaller, slightly lower quality)
35
- python generate.py --prompt "forest" --precision int8
36
  ```
37
 
38
  ---
39
 
40
- ## 📋 All Options
41
-
42
- | Flag | Default | Description |
43
- |------|---------|-------------|
44
- | `--prompt` | "a cute cat" | What to generate |
45
- | `--num_samples` | 16 | Number of images |
46
- | `--num_steps` | 30 | Sampling steps (20-50 recommended) |
47
- | `--cfg_scale` | 7.5 | Guidance strength (5-10 recommended) |
48
- | `--precision` | bf16 | `bf16` (best) or `int8` (smaller) |
49
- | `--topk` | 2 | Experts per sample (1 or 2) |
50
- | `--offload` | 0 | Experts to keep on CPU (0-7) |
51
- | `--visualize` | off | Show expert routing stats |
52
- | `--output` | auto | Output filename |
53
- | `--seed` | 999 | Random seed |
54
-
55
- ---
56
-
57
- ## 🔍 Expert Visualization
58
 
59
- Use `--visualize` to see which experts the router selects:
60
 
61
  ```
62
  ╭──────────────────────────────────────────────────╮
63
  │ ⚡ EXPERT USAGE DISTRIBUTION │
64
  ├──────────────────────────────────────────────────┤
65
  │ → E4 │████████████████████████████│ 40.6% │
66
- │ E2 │██████████████████████████ │ 36.7% │
67
- │ E6 │██████████ │ 14.8% │
68
- │ E1 │███ │ 5.5% │
69
- │ E5 │█ │ 2.3% │
70
- │ E0 │ │ 0.0% │
71
- │ E3 │ │ 0.0% │
72
- │ E7 │ │ 0.0% │
73
  ├──────────────────────────────────────────────────┤
74
  │ Active: 5/8 experts Calls: 128 │
75
  ╰──────────────────────────────────────────���───────╯
@@ -77,78 +90,267 @@ Use `--visualize` to see which experts the router selects:
77
  ╭──────────────────────────────────────────────────╮
78
  │ 📈 ROUTING TIMELINE │
79
  ├──────────────────────────────────────────────────┤
80
- │ Step 0 1 2 3 4 5 6 7 8 9 10 11 ...
81
- │ ────────────────────────────────────────────
82
- │ E0 · · · · · · · · · · · ·
83
- E2 · · · · · ·
84
- E4 · · ● ● ● ● · · · · · ·
85
- E6 · · · · · · · · · ·
 
 
 
 
86
  ├──────────────────────────────────────────────────┤
87
- │ Routing changes: 2/11 steps (18%) │
88
  ╰──────────────────────────────────────────────────╯
89
  ```
90
 
 
 
91
  ---
92
 
93
- ## 💾 Memory & Speed
94
 
95
- | Config | GPU Memory | Speed |
96
- |--------|-----------|-------|
97
- | BF16 (all on GPU) | ~25 GB | ~3 img/s |
98
- | BF16 + offload 4 | ~14 GB | ~1 img/s |
99
- | INT8 (all on GPU) | ~12 GB | ~2 img/s |
100
- | INT8 + offload 4 | ~8 GB | ~0.5 img/s |
 
 
 
 
 
 
101
 
102
  ---
103
 
104
- ## 🏗️ Architecture
105
 
106
  ```
107
- ┌─────────────────────────────────────────┐
108
- Paris MoE Model
109
- ├─────────────────────────────────────────┤
110
- Router: DiT-B/2 (129M params)
111
- selects top-K experts
112
- Experts: 8× DiT-XL/2 (606M each)
113
- predicts velocity
114
- VAE: Stable Diffusion VAE
115
- ↓ decodes to pixels
116
- Output: 256×256 RGB
117
- └─────────────────────────────────────────┘
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ```
119
 
120
- - **Total Parameters**: ~5 Billion
121
- - **Latent Space**: 32×32×4
122
- - **Text Encoder**: CLIP ViT-L/14
 
 
 
 
 
123
 
124
  ---
125
 
126
- ## 📁 Files
127
 
128
- ```
129
- ├── generate.py # Main generation script
130
- ├── benchmark.py # Performance testing
131
- ├── quantize.py # Weight conversion tool
132
- ├── src/ # Model code
133
- └── weights/
134
- ├── bf16/ # BFloat16 weights (9.3 GB)
135
- └── int8/ # INT8 weights (4.8 GB)
136
- ```
137
 
138
  ---
139
 
140
- ## 🔧 Convert Your Own Weights
 
 
141
 
142
  ```bash
143
- # From PyTorch .pt to BF16 safetensors
 
 
 
 
 
 
 
144
  python quantize.py --input /path/to/weights --output ./weights/bf16 --format bf16
145
 
146
- # From BF16 to INT8
147
  python quantize.py --input ./weights/bf16 --output ./weights/int8 --format int8
148
  ```
149
 
150
  ---
151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  ## 📜 License
153
 
154
- Apache 2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: agpl-3.0
3
+ tags:
4
+ - text-to-image
5
+ - diffusion
6
+ - mixture-of-experts
7
+ - moe
8
+ - dit
9
+ - distributed-inference
10
+ base_model: bageldotcom/paris
11
+ pipeline_tag: text-to-image
12
+ library_name: pytorch
13
+ ---
14
+
15
+ <div align="center">
16
+
17
+ # 🥖 Baguette
18
+
19
+ ### A Distributed Inference Engine for Paris MoE Diffusion Models
20
+
21
+ [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
22
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
23
+ [![HuggingFace](https://img.shields.io/badge/🤗-Original%20Model-yellow)](https://huggingface.co/bageldotcom/paris)
24
+
25
+ *Fast, efficient inference for the 5-billion parameter Paris Mixture-of-Experts text-to-image model*
26
+
27
+ </div>
28
 
29
+ ---
30
 
31
  ## ⚡ Quick Start
32
 
33
  ```bash
34
+ # Clone the repo
35
+ git clone https://huggingface.co/nbagel/baguette
36
+ cd baguette
37
+
38
  # Install dependencies
39
  pip install uv && uv pip install torch torchvision safetensors transformers diffusers accelerate tqdm
40
 
41
+ # Generate images
42
  python generate.py --prompt "a cute cat" --num_samples 4
43
  ```
44
 
45
+ **Output:** `output_bf16.png` with 4 generated images.
46
 
47
  ---
48
 
49
+ ## 🎨 Generation Examples
50
 
51
  ```bash
52
+ # Basic generation (4 images, top-2 routing, 30 steps)
53
+ python generate.py --prompt "sunset over mountains" --num_samples 4
54
 
55
+ # See expert routing visualization
56
+ python generate.py --prompt "abstract art" --visualize
57
 
58
+ # Faster generation
59
+ python generate.py --prompt "a happy dog" --num_steps 20
60
 
61
+ # Lower memory usage (offload experts to CPU)
62
+ python generate.py --prompt "portrait of a scientist" --offload 4
63
 
64
+ # INT8 quantized (smaller weights)
65
+ python generate.py --prompt "enchanted forest" --precision int8
66
  ```
67
 
68
  ---
69
 
70
+ ## 🔮 Expert Routing Visualization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
+ Baguette includes real-time visualization of the MoE router's expert selection. Use `--visualize` to see which experts are activated:
73
 
74
  ```
75
  ╭──────────────────────────────────────────────────╮
76
  │ ⚡ EXPERT USAGE DISTRIBUTION │
77
  ├──────────────────────────────────────────────────┤
78
  │ → E4 │████████████████████████████│ 40.6% │
79
+ │ E2 │██████████████████████████│ 36.7% │
80
+ │ E6 │██████████│ 14.8% │
81
+ │ E1 │███│ 5.5% │
82
+ │ E5 │█│ 2.3% │
83
+ │ E0 │ │ 0.0% │
84
+ │ E3 │ │ 0.0% │
85
+ │ E7 │ │ 0.0% │
86
  ├──────────────────────────────────────────────────┤
87
  │ Active: 5/8 experts Calls: 128 │
88
  ╰──────────────────────────────────────────���───────╯
 
90
  ╭──────────────────────────────────────────────────╮
91
  │ 📈 ROUTING TIMELINE │
92
  ├──────────────────────────────────────────────────┤
93
+ │ Step 0 1 2 3 4 5 6 7 8 9 10 11 12 13
94
+ │ ───────────────────────────────────────────────
95
+ │ E0 · · · · · · · · · · · · · ·
96
+ E1 · · · · · · · · · · · · · ·
97
+ E2 · · · · · ● ● ● ●
98
+ E3 · · · · · · · · · · · · · ·
99
+ │ E4 · · ● ● ● · · · · · · · · · │
100
+ │ E5 · · · · · · · · · · · · · · │
101
+ │ E6 ● ● · · · · · · · · · · · · │
102
+ │ E7 · · · · · · · · · · · · · · │
103
  ├──────────────────────────────────────────────────┤
104
+ │ Routing changes: 2/13 steps (15%) │
105
  ╰──────────────────────────────────────────────────╯
106
  ```
107
 
108
+ The router dynamically selects different experts based on the noise level at each diffusion timestep. Early steps (high noise) often use different experts than later steps (low noise).
109
+
110
  ---
111
 
112
+ ## 📋 Command Reference
113
 
114
+ | Flag | Default | Description |
115
+ |:-----|:--------|:------------|
116
+ | `--prompt` | `"a cute cat"` | Text description of the image to generate |
117
+ | `--num_samples` | `16` | Number of images to generate |
118
+ | `--num_steps` | `30` | Diffusion sampling steps (15-50) |
119
+ | `--cfg_scale` | `7.5` | Classifier-free guidance scale (5-12) |
120
+ | `--precision` | `bf16` | Weight precision: `bf16` or `int8` |
121
+ | `--topk` | `2` | Number of experts per sample (1-8) |
122
+ | `--offload` | `0` | Experts to offload to CPU RAM (0-7) |
123
+ | `--visualize` | `false` | Show expert routing statistics |
124
+ | `--output` | `auto` | Custom output filename |
125
+ | `--seed` | `999` | Random seed for reproducibility |
126
 
127
  ---
128
 
129
+ ## 🏗️ Model Architecture
130
 
131
  ```
132
+ ┌─────────────────────────────────────────────────────────────────
133
+ PARIS MoE ARCHITECTURE
134
+ ├─────────────────────────────────────────────────────────────────
135
+
136
+ Input: Text Prompt ──→ CLIP ViT-L/14 ──→ Text Embeddings
137
+
138
+ Noise: z ~ N(0,1) ──→ 32×32×4 Latent
139
+
140
+
141
+ ┌─────────────────────────────────────────────────────────┐
142
+ │ │ DiT-B/2 ROUTER │ │
143
+ │ │ (12 layers, 768 dim, 129M params) │ │
144
+ │ │ │ │ │
145
+ │ │ Selects Top-K Experts per Step │ │
146
+ │ └─────────────────────────────────────────────────────────┘ │
147
+ │ │ │
148
+ │ ┌───────────────────┼───────────────────┐ │
149
+ │ ▼ ▼ ▼ │
150
+ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
151
+ │ │ Expert 0 │ │ Expert 1 │ ··· │ Expert 7 │ │
152
+ │ │ DiT-XL/2 │ │ DiT-XL/2 │ │ DiT-XL/2 │ │
153
+ │ │ 606M │ │ 606M │ │ 606M │ │
154
+ │ └────────────┘ └────────────┘ └────────────┘ │
155
+ │ │ │ │ │
156
+ │ └───────────────────┼───────────────────┘ │
157
+ │ ▼ │
158
+ │ Weighted Velocity Prediction │
159
+ │ │ │
160
+ │ ▼ │
161
+ │ ┌─────────────────────────────────────────────────────────┐ │
162
+ │ │ SD-VAE DECODER │ │
163
+ │ │ Latent ──→ 256×256 RGB │ │
164
+ │ └─────────────────────────────────────────────────────────┘ │
165
+ │ │
166
+ ├─────────────────────────────────────────────────────────────────┤
167
+ │ Total: ~5 Billion Parameters │ 8 Specialized Experts │
168
+ └─────────────────────────────────────────────────────────────────┘
169
  ```
170
 
171
+ ---
172
+
173
+ ## 💾 Available Weights
174
+
175
+ | Format | Size | Quality | Speed | Use Case |
176
+ |:-------|:-----|:--------|:------|:---------|
177
+ | **BF16** | 9.3 GB | ⭐⭐⭐⭐⭐ | Fastest | Production, best quality |
178
+ | **INT8** | 4.8 GB | ⭐⭐⭐⭐ | Fast | Memory-constrained GPUs |
179
 
180
  ---
181
 
182
+ ## 🖥️ Memory Requirements
183
 
184
+ | Configuration | GPU VRAM | Speed | Notes |
185
+ |:--------------|:---------|:------|:------|
186
+ | BF16, no offload | ~25 GB | ~3 img/s | Best performance |
187
+ | BF16, offload 4 | ~14 GB | ~1 img/s | RTX 4090 / A6000 |
188
+ | BF16, offload 6 | ~8 GB | ~0.5 img/s | RTX 3080/4080 |
189
+ | INT8, no offload | ~12 GB | ~2 img/s | Good balance |
190
+ | INT8, offload 4 | ~8 GB | ~0.5 img/s | Consumer GPUs |
 
 
191
 
192
  ---
193
 
194
+ ## 🔧 Utilities
195
+
196
+ ### Benchmarking
197
 
198
  ```bash
199
+ python benchmark.py --quick # Fast benchmark
200
+ python benchmark.py --output results.md # Full benchmark, save results
201
+ ```
202
+
203
+ ### Weight Conversion
204
+
205
+ ```bash
206
+ # Convert PyTorch checkpoints to BF16 SafeTensors
207
  python quantize.py --input /path/to/weights --output ./weights/bf16 --format bf16
208
 
209
+ # Convert BF16 to INT8
210
  python quantize.py --input ./weights/bf16 --output ./weights/int8 --format int8
211
  ```
212
 
213
  ---
214
 
215
+ ## 🚀 Future: Distributed Inference with Tailscale + Erlang
216
+
217
+ Baguette is being developed as a **fully distributed inference engine** that can run across multiple machines connected via [Tailscale](https://tailscale.com/) VPN, orchestrated by an Erlang/OTP supervisor.
218
+
219
+ ### 🌐 Architecture Vision
220
+
221
+ ```
222
+ ┌─────────────────────────────────────────────────────────────────────────┐
223
+ │ BAGUETTE DISTRIBUTED NETWORK │
224
+ │ (Up to 8 Nodes) │
225
+ ├─────────────────────────────────────────────────────────────────────────┤
226
+ │ │
227
+ │ ┌─────────────┐ Tailscale VPN Mesh ┌─────────────┐ │
228
+ │ │ Node 1 │◄────────────────────────────►│ Node 2 │ │
229
+ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │
230
+ │ │ │ Router │ │ │ │ Router │ │ │
231
+ │ │ │ VAE │ │ │ │ VAE │ │ │
232
+ │ │ │Expert 0 │ │ │ │Expert 1 │ │ │
233
+ │ │ └─────────┘ │ │ └─────────┘ │ │
234
+ │ └──────┬──────┘ └──────┬──────┘ │
235
+ │ │ │ │
236
+ │ │ ┌──────────────────┐ │ │
237
+ │ └────────►│ Erlang/OTP │◄─────────────┘ │
238
+ │ │ Coordinator │ │
239
+ │ ┌────────►│ │◄─────────────┐ │
240
+ │ │ │ • Load Balance │ │ │
241
+ │ │ │ • Fault Tolerant│ │ │
242
+ │ │ │ • Auto-Healing │ │ │
243
+ │ │ └──────────────────┘ │ │
244
+ │ │ │ │
245
+ │ ┌──────┴──────┐ ┌──────┴──────┐ │
246
+ │ │ Node 3 │◄────────────────────────────►│ Node 4 │ │
247
+ │ │ ┌─────────┐ │ ... │ ┌─────────┐ │ │
248
+ │ │ │ Router │ │ │ │ Router │ │ │
249
+ │ │ │ VAE │ │ (up to 8 nodes) │ │ VAE │ │ │
250
+ │ │ │Expert 2 │ │ │ │Expert 3 │ │ │
251
+ │ │ └─────────┘ │ │ └─────────┘ │ │
252
+ │ └─────────────┘ └─────────────┘ │
253
+ │ │
254
+ └─────────────────────────────────────────────────────────────────────────┘
255
+ ```
256
+
257
+ ### 🎯 Key Features (Planned)
258
+
259
+ | Feature | Description |
260
+ |:--------|:------------|
261
+ | **Self-Organizing Network** | Nodes automatically discover peers and negotiate roles |
262
+ | **Adaptive Load Balancing** | Routes requests based on real-time latency and compute availability |
263
+ | **Auto-Benchmarking** | Each node benchmarks GPU/CPU speed, VRAM, RAM, and network throughput |
264
+ | **Fault Tolerance** | Erlang supervisors restart failed nodes, redistribute load |
265
+ | **1 Expert Per Node** | Each node loads only 1 expert (~2.7GB VRAM) plus router & VAE |
266
+ | **Latency-Aware Routing** | Prioritizes low-latency nodes for time-sensitive steps |
267
+ | **Zero Configuration** | Just join the Tailscale network and run—automatic peer discovery |
268
+
269
+ ### 📊 Node Self-Benchmarking
270
+
271
+ When a node joins the network, it automatically benchmarks:
272
+
273
+ ```
274
+ ┌─────────────────────��──────────────────┐
275
+ │ NODE CAPABILITY REPORT │
276
+ ├────────────────────────────────────────┤
277
+ │ GPU: NVIDIA RTX 4090 │
278
+ │ VRAM: 24 GB │
279
+ │ GPU Compute: 847 TFLOPS (FP16) │
280
+ │ ──────────────────────────────────── │
281
+ │ CPU: AMD Ryzen 9 7950X │
282
+ │ RAM: 64 GB │
283
+ │ CPU Compute: 2.1 TFLOPS │
284
+ │ ──────────────────────────────────── │
285
+ │ Network Latency to Peers: │
286
+ │ → Node 2: 12ms │
287
+ │ → Node 3: 8ms │
288
+ │ → Node 4: 45ms │
289
+ │ Network Bandwidth: 940 Mbps │
290
+ │ ──────────────────────────────────── │
291
+ │ Assigned Expert: E0 │
292
+ │ Status: READY │
293
+ └────────────────────────────────────────┘
294
+ ```
295
+
296
+ ### 🔄 Distributed Inference Flow
297
+
298
+ 1. **Request arrives** at any node
299
+ 2. **Router runs locally** → selects top-K experts needed
300
+ 3. **Coordinator dispatches** expert calls to appropriate nodes
301
+ 4. **Nodes compute in parallel** → return velocity predictions
302
+ 5. **Results aggregated** → Euler step applied
303
+ 6. **VAE decodes locally** → image returned to requester
304
+
305
+ This enables running the full 5B parameter model across consumer hardware—each machine only needs ~4GB VRAM to hold one expert.
306
+
307
+ ---
308
+
309
+ ## 📁 Repository Structure
310
+
311
+ ```
312
+ baguette/
313
+ ├── generate.py # 🎨 Main generation script
314
+ ├── benchmark.py # 📊 Performance benchmarking
315
+ ├── quantize.py # 🔧 Weight format conversion
316
+ ├── requirements.txt # 📦 Python dependencies
317
+ ├── README.md # 📖 This file
318
+ ├── src/ # 🧠 Model architecture code
319
+ │ ├── models.py # DiT expert & router definitions
320
+ │ ├── vae_utils.py # VAE encoding/decoding
321
+ │ ├── config.py # Configuration dataclass
322
+ │ └── schedules.py # Noise schedules
323
+ └── weights/ # 💾 Model weights
324
+ ├── bf16/ # BFloat16 SafeTensors (9.3 GB)
325
+ │ ├── expert_0.safetensors ... expert_7.safetensors
326
+ │ ├── router.safetensors
327
+ │ └── config.pt
328
+ └── int8/ # INT8 Quantized (4.8 GB)
329
+ ├── expert_0.safetensors ... expert_7.safetensors
330
+ └── router.safetensors
331
+ ```
332
+
333
+ ---
334
+
335
+ ## 🔗 Links
336
+
337
+ - **Original Model**: [bageldotcom/paris](https://huggingface.co/bageldotcom/paris)
338
+ - **This Repository**: [nbagel/baguette](https://huggingface.co/nbagel/baguette)
339
+
340
+ ---
341
+
342
  ## 📜 License
343
 
344
+ This project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.
345
+
346
+ See [LICENSE](LICENSE) for details.
347
+
348
+ ---
349
+
350
+ <div align="center">
351
+
352
+ **Made with 🥖 by the Baguette Team**
353
+
354
+ *Distributed inference for everyone*
355
+
356
+ </div>