nbagel
/

baguette

@@ -1,75 +1,88 @@
-# 🥖 Baguette - Paris MoE Text-to-Image
-A ~5 billion parameter Mixture-of-Experts diffusion model with 8 specialized experts.
 ## ⚡ Quick Start
 ```bash
 # Install dependencies
 pip install uv && uv pip install torch torchvision safetensors transformers diffusers accelerate tqdm
-# Generate 4 cat images
 python generate.py --prompt "a cute cat" --num_samples 4
 ```
-That's it! Images saved to `output_bf16.png`.
 ---
-## 🎨 Examples
 ```bash
-# Simple generation
-python generate.py --prompt "sunset over mountains"
-# More samples, see expert routing
-python generate.py --prompt "abstract art" --num_samples 16 --visualize
-# Faster with fewer steps
-python generate.py --prompt "a dog" --num_steps 15
-# Lower memory (offload 4 experts to CPU)
-python generate.py --prompt "portrait" --offload 4
-# INT8 weights (smaller, slightly lower quality)
-python generate.py --prompt "forest" --precision int8
 ```
 ---
-## 📋 All Options
-| Flag | Default | Description |
-|------|---------|-------------|
-| `--prompt` | "a cute cat" | What to generate |
-| `--num_samples` | 16 | Number of images |
-| `--num_steps` | 30 | Sampling steps (20-50 recommended) |
-| `--cfg_scale` | 7.5 | Guidance strength (5-10 recommended) |
-| `--precision` | bf16 | `bf16` (best) or `int8` (smaller) |
-| `--topk` | 2 | Experts per sample (1 or 2) |
-| `--offload` | 0 | Experts to keep on CPU (0-7) |
-| `--visualize` | off | Show expert routing stats |
-| `--output` | auto | Output filename |
-| `--seed` | 999 | Random seed |
----
-## 🔍 Expert Visualization
-Use `--visualize` to see which experts the router selects:
 ```
 ╭──────────────────────────────────────────────────╮
 │           ⚡ EXPERT USAGE DISTRIBUTION            │
 ├──────────────────────────────────────────────────┤
 │ → E4  │████████████████████████████│ 40.6% │
-│   E2  │██████████████████████████  │ 36.7% │
-│   E6  │██████████               │ 14.8% │
-│   E1  │███                      │  5.5% │
-│   E5  │█                        │  2.3% │
-│   E0  │                         │  0.0% │
-│   E3  │                         │  0.0% │
-│   E7  │                         │  0.0% │
 ├──────────────────────────────────────────────────┤
 │  Active: 5/8 experts   Calls: 128               │
 ╰──────────────────────────────────────────���───────╯
@@ -77,78 +90,267 @@ Use `--visualize` to see which experts the router selects:
 ╭──────────────────────────────────────────────────╮
 │            📈 ROUTING TIMELINE                   │
 ├──────────────────────────────────────────────────┤
-│ Step  0  1  2  3  4  5  6  7  8  9 10 11 ...    │
-│ ────────────────────────────────────────────     │
-│  E0   ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·        │
-│  E2   ·  ·  ·  ·  ·  ·  ●  ●  ●  ●  ●  ●        │
-│  E4   ·  ·  ●  ●  ●  ●  ·  ·  ·  ·  ·  ·        │
-│  E6   ●  ●  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·        │
 ├──────────────────────────────────────────────────┤
-│  Routing changes:   2/11 steps (18%)            │
 ╰──────────────────────────────────────────────────╯
 ```
 ---
-## 💾 Memory & Speed
-| Config | GPU Memory | Speed |
-|--------|-----------|-------|
-| BF16 (all on GPU) | ~25 GB | ~3 img/s |
-| BF16 + offload 4 | ~14 GB | ~1 img/s |
-| INT8 (all on GPU) | ~12 GB | ~2 img/s |
-| INT8 + offload 4 | ~8 GB | ~0.5 img/s |
 ---
-## 🏗️ Architecture
 ```
-┌─────────────────────────────────────────┐
-│            Paris MoE Model              │
-├─────────────────────────────────────────┤
-│  Router: DiT-B/2 (129M params)          │
-│    ↓ selects top-K experts              │
-│  Experts: 8× DiT-XL/2 (606M each)       │
-│    ↓ predicts velocity                  │
-│  VAE: Stable Diffusion VAE              │
-│    ↓ decodes to pixels                  │
-│  Output: 256×256 RGB                    │
-└─────────────────────────────────────────┘
 ```
-- **Total Parameters**: ~5 Billion
-- **Latent Space**: 32×32×4
-- **Text Encoder**: CLIP ViT-L/14
 ---
-## 📁 Files
-```
-├── generate.py      # Main generation script
-├── benchmark.py     # Performance testing
-├── quantize.py      # Weight conversion tool
-├── src/             # Model code
-└── weights/
-    ├── bf16/        # BFloat16 weights (9.3 GB)
-    └── int8/        # INT8 weights (4.8 GB)
-```
 ---
-## 🔧 Convert Your Own Weights
 ```bash
-# From PyTorch .pt to BF16 safetensors
 python quantize.py --input /path/to/weights --output ./weights/bf16 --format bf16
-# From BF16 to INT8
 python quantize.py --input ./weights/bf16 --output ./weights/int8 --format int8
 ```
 ---
 ## 📜 License
-Apache 2.0

+---
+license: agpl-3.0
+tags:
+  - text-to-image
+  - diffusion
+  - mixture-of-experts
+  - moe
+  - dit
+  - distributed-inference
+base_model: bageldotcom/paris
+pipeline_tag: text-to-image
+library_name: pytorch
+---
+<div align="center">
+# 🥖 Baguette
+### A Distributed Inference Engine for Paris MoE Diffusion Models
+[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![HuggingFace](https://img.shields.io/badge/🤗-Original%20Model-yellow)](https://huggingface.co/bageldotcom/paris)
+*Fast, efficient inference for the 5-billion parameter Paris Mixture-of-Experts text-to-image model*
+</div>
+---
 ## ⚡ Quick Start
 ```bash
+# Clone the repo
+git clone https://huggingface.co/nbagel/baguette
+cd baguette
 # Install dependencies
 pip install uv && uv pip install torch torchvision safetensors transformers diffusers accelerate tqdm
+# Generate images
 python generate.py --prompt "a cute cat" --num_samples 4
 ```
+**Output:** `output_bf16.png` with 4 generated images.
 ---
+## 🎨 Generation Examples
 ```bash
+# Basic generation (4 images, top-2 routing, 30 steps)
+python generate.py --prompt "sunset over mountains" --num_samples 4
+# See expert routing visualization
+python generate.py --prompt "abstract art" --visualize
+# Faster generation
+python generate.py --prompt "a happy dog" --num_steps 20
+# Lower memory usage (offload experts to CPU)
+python generate.py --prompt "portrait of a scientist" --offload 4
+# INT8 quantized (smaller weights)
+python generate.py --prompt "enchanted forest" --precision int8
 ```
 ---
+## 🔮 Expert Routing Visualization
+Baguette includes real-time visualization of the MoE router's expert selection. Use `--visualize` to see which experts are activated:
 ```
 ╭──────────────────────────────────────────────────╮
 │           ⚡ EXPERT USAGE DISTRIBUTION            │
 ├──────────────────────────────────────────────────┤
 │ → E4  │████████████████████████████│ 40.6% │
+│   E2  │██████████████████████████▎ │ 36.7% │
+│   E6  │██████████▌                 │ 14.8% │
+│   E1  │███▊                        │  5.5% │
+│   E5  │█▋                          │  2.3% │
+│   E0  │                            │  0.0% │
+│   E3  │                            │  0.0% │
+│   E7  │                            │  0.0% │
 ├──────────────────────────────────────────────────┤
 │  Active: 5/8 experts   Calls: 128               │
 ╰──────────────────────────────────────────���───────╯
 ╭──────────────────────────────────────────────────╮
 │            📈 ROUTING TIMELINE                   │
 ├──────────────────────────────────────────────────┤
+│ Step  0  1  2  3  4  5  6  7  8  9 10 11 12 13  │
+│ ───────────────────────────────────────────────  │
+│  E0   ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
+│  E1   ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
+│  E2   ·  ·  ·  ·  ·  ●  ●  ●  ●  ●  ●  ●  ●  ●  │
+│  E3   ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
+│  E4   ·  ·  ●  ●  ●  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
+│  E5   ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
+│  E6   ●  ●  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
+│  E7   ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  ·  │
 ├──────────────────────────────────────────────────┤
+│  Routing changes:   2/13 steps (15%)            │
 ╰──────────────────────────────────────────────────╯
 ```
+The router dynamically selects different experts based on the noise level at each diffusion timestep. Early steps (high noise) often use different experts than later steps (low noise).
 ---
+## 📋 Command Reference
+| Flag | Default | Description |
+|:-----|:--------|:------------|
+| `--prompt` | `"a cute cat"` | Text description of the image to generate |
+| `--num_samples` | `16` | Number of images to generate |
+| `--num_steps` | `30` | Diffusion sampling steps (15-50) |
+| `--cfg_scale` | `7.5` | Classifier-free guidance scale (5-12) |
+| `--precision` | `bf16` | Weight precision: `bf16` or `int8` |
+| `--topk` | `2` | Number of experts per sample (1-8) |
+| `--offload` | `0` | Experts to offload to CPU RAM (0-7) |
+| `--visualize` | `false` | Show expert routing statistics |
+| `--output` | `auto` | Custom output filename |
+| `--seed` | `999` | Random seed for reproducibility |
 ---
+## 🏗️ Model Architecture
 ```
+┌─────────────────────────────────────────────────────────────────┐
+│                     PARIS MoE ARCHITECTURE                      │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│   Input: Text Prompt ──→ CLIP ViT-L/14 ──→ Text Embeddings     │
+│                                                                 │
+│   Noise: z ~ N(0,1) ──→ 32×32×4 Latent                         │
+│                              │                                  │
+│                              ▼                                  │
+│   ┌─────────────────────────────────────────────────────────┐  │
+│   │                  DiT-B/2 ROUTER                         │  │
+│   │            (12 layers, 768 dim, 129M params)            │  │
+│   │                         │                               │  │
+│   │            Selects Top-K Experts per Step               │  │
+│   └─────────────────────────────────────────────────────────┘  │
+│                              │                                  │
+│          ┌───────────────────┼───────────────────┐             │
+│          ▼                   ▼                   ▼             │
+│   ┌────────────┐      ┌────────────┐      ┌────────────┐       │
+│   │  Expert 0  │      │  Expert 1  │ ···  │  Expert 7  │       │
+│   │  DiT-XL/2  │      │  DiT-XL/2  │      │  DiT-XL/2  │       │
+│   │   606M     │      │   606M     │      │   606M     │       │
+│   └────────────┘      └────────────┘      └────────────┘       │
+│          │                   │                   │             │
+│          └───────────────────┼───────────────────┘             │
+│                              ▼                                  │
+│                   Weighted Velocity Prediction                  │
+│                              │                                  │
+│                              ▼                                  │
+│   ┌─────────────────────────────────────────────────────────┐  │
+│   │                 SD-VAE DECODER                          │  │
+│   │              Latent ──→ 256×256 RGB                     │  │
+│   └─────────────────────────────────────────────────────────┘  │
+│                                                                 │
+├─────────────────────────────────────────────────────────────────┤
+│  Total: ~5 Billion Parameters  │  8 Specialized Experts        │
+└─────────────────────────────────────────────────────────────────┘
 ```
+---
+## 💾 Available Weights
+| Format | Size | Quality | Speed | Use Case |
+|:-------|:-----|:--------|:------|:---------|
+| **BF16** | 9.3 GB | ⭐⭐⭐⭐⭐ | Fastest | Production, best quality |
+| **INT8** | 4.8 GB | ⭐⭐⭐⭐ | Fast | Memory-constrained GPUs |
 ---
+## 🖥️ Memory Requirements
+| Configuration | GPU VRAM | Speed | Notes |
+|:--------------|:---------|:------|:------|
+| BF16, no offload | ~25 GB | ~3 img/s | Best performance |
+| BF16, offload 4 | ~14 GB | ~1 img/s | RTX 4090 / A6000 |
+| BF16, offload 6 | ~8 GB | ~0.5 img/s | RTX 3080/4080 |
+| INT8, no offload | ~12 GB | ~2 img/s | Good balance |
+| INT8, offload 4 | ~8 GB | ~0.5 img/s | Consumer GPUs |
 ---
+## 🔧 Utilities
+### Benchmarking
 ```bash
+python benchmark.py --quick                    # Fast benchmark
+python benchmark.py --output results.md        # Full benchmark, save results
+```
+### Weight Conversion
+```bash
+# Convert PyTorch checkpoints to BF16 SafeTensors
 python quantize.py --input /path/to/weights --output ./weights/bf16 --format bf16
+# Convert BF16 to INT8
 python quantize.py --input ./weights/bf16 --output ./weights/int8 --format int8
 ```
 ---
+## 🚀 Future: Distributed Inference with Tailscale + Erlang
+Baguette is being developed as a **fully distributed inference engine** that can run across multiple machines connected via [Tailscale](https://tailscale.com/) VPN, orchestrated by an Erlang/OTP supervisor.
+### 🌐 Architecture Vision
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                    BAGUETTE DISTRIBUTED NETWORK                         │
+│                         (Up to 8 Nodes)                                 │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                         │
+│   ┌─────────────┐      Tailscale VPN Mesh      ┌─────────────┐         │
+│   │   Node 1    │◄────────────────────────────►│   Node 2    │         │
+│   │ ┌─────────┐ │                              │ ┌─────────┐ │         │
+│   │ │ Router  │ │                              │ │ Router  │ │         │
+│   │ │   VAE   │ │                              │ │   VAE   │ │         │
+│   │ │Expert 0 │ │                              │ │Expert 1 │ │         │
+│   │ └─────────┘ │                              │ └─────────┘ │         │
+│   └──────┬──────┘                              └──────┬──────┘         │
+│          │                                            │                 │
+│          │         ┌──────────────────┐              │                 │
+│          └────────►│  Erlang/OTP      │◄─────────────┘                 │
+│                    │  Coordinator     │                                 │
+│          ┌────────►│                  │◄─────────────┐                 │
+│          │         │  • Load Balance  │              │                 │
+│          │         │  • Fault Tolerant│              │                 │
+│          │         │  • Auto-Healing  │              │                 │
+│          │         └──────────────────┘              │                 │
+│          │                                            │                 │
+│   ┌──────┴──────┐                              ┌──────┴──────┐         │
+│   │   Node 3    │◄────────────────────────────►│   Node 4    │         │
+│   │ ┌─────────┐ │           ...                │ ┌─────────┐ │         │
+│   │ │ Router  │ │                              │ │ Router  │ │         │
+│   │ │   VAE   │ │        (up to 8 nodes)       │ │   VAE   │ │         │
+│   │ │Expert 2 │ │                              │ │Expert 3 │ │         │
+│   │ └─────────┘ │                              │ └─────────┘ │         │
+│   └─────────────┘                              └─────────────┘         │
+│                                                                         │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+### 🎯 Key Features (Planned)
+| Feature | Description |
+|:--------|:------------|
+| **Self-Organizing Network** | Nodes automatically discover peers and negotiate roles |
+| **Adaptive Load Balancing** | Routes requests based on real-time latency and compute availability |
+| **Auto-Benchmarking** | Each node benchmarks GPU/CPU speed, VRAM, RAM, and network throughput |
+| **Fault Tolerance** | Erlang supervisors restart failed nodes, redistribute load |
+| **1 Expert Per Node** | Each node loads only 1 expert (~2.7GB VRAM) plus router & VAE |
+| **Latency-Aware Routing** | Prioritizes low-latency nodes for time-sensitive steps |
+| **Zero Configuration** | Just join the Tailscale network and run—automatic peer discovery |
+### 📊 Node Self-Benchmarking
+When a node joins the network, it automatically benchmarks:
+```
+┌─────────────────────��──────────────────┐
+│         NODE CAPABILITY REPORT         │
+├────────────────────────────────────────┤
+│  GPU: NVIDIA RTX 4090                  │
+│  VRAM: 24 GB                           │
+│  GPU Compute: 847 TFLOPS (FP16)        │
+│  ────────────────────────────────────  │
+│  CPU: AMD Ryzen 9 7950X                │
+│  RAM: 64 GB                            │
+│  CPU Compute: 2.1 TFLOPS               │
+│  ────────────────────────────────────  │
+│  Network Latency to Peers:             │
+│    → Node 2: 12ms                      │
+│    → Node 3: 8ms                       │
+│    → Node 4: 45ms                      │
+│  Network Bandwidth: 940 Mbps           │
+│  ────────────────────────────────────  │
+│  Assigned Expert: E0                   │
+│  Status: READY                         │
+└────────────────────────────────────────┘
+```
+### 🔄 Distributed Inference Flow
+1. **Request arrives** at any node
+2. **Router runs locally** → selects top-K experts needed
+3. **Coordinator dispatches** expert calls to appropriate nodes
+4. **Nodes compute in parallel** → return velocity predictions
+5. **Results aggregated** → Euler step applied
+6. **VAE decodes locally** → image returned to requester
+This enables running the full 5B parameter model across consumer hardware—each machine only needs ~4GB VRAM to hold one expert.
+---
+## 📁 Repository Structure
+```
+baguette/
+├── generate.py          # 🎨 Main generation script
+├── benchmark.py         # 📊 Performance benchmarking
+├── quantize.py          # 🔧 Weight format conversion
+├── requirements.txt     # 📦 Python dependencies
+├── README.md            # 📖 This file
+├── src/                 # 🧠 Model architecture code
+│   ├── models.py        # DiT expert & router definitions
+│   ├── vae_utils.py     # VAE encoding/decoding
+│   ├── config.py        # Configuration dataclass
+│   └── schedules.py     # Noise schedules
+└── weights/             # 💾 Model weights
+    ├── bf16/            # BFloat16 SafeTensors (9.3 GB)
+    │   ├── expert_0.safetensors ... expert_7.safetensors
+    │   ├── router.safetensors
+    │   └── config.pt
+    └── int8/            # INT8 Quantized (4.8 GB)
+        ├── expert_0.safetensors ... expert_7.safetensors
+        └── router.safetensors
+```
+---
+## 🔗 Links
+- **Original Model**: [bageldotcom/paris](https://huggingface.co/bageldotcom/paris)
+- **This Repository**: [nbagel/baguette](https://huggingface.co/nbagel/baguette)
+---
 ## 📜 License
+This project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.
+See [LICENSE](LICENSE) for details.
+---
+<div align="center">
+**Made with 🥖 by the Baguette Team**
+*Distributed inference for everyone*
+</div>