Upload folder using huggingface_hub
Browse files- technical_report_mdiffae.md +25 -14
technical_report_mdiffae.md
CHANGED
|
@@ -138,26 +138,37 @@ Training checkpoint: step 708,000 (EMA weights).
|
|
| 138 |
|
| 139 |
## 7. Results
|
| 140 |
|
| 141 |
-
Reconstruction quality evaluated on
|
| 142 |
|
| 143 |
-
### 7.1
|
| 144 |
|
| 145 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
-
|
| 148 |
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
| Steps | 1 |
|
| 153 |
-
| Schedule | linear |
|
| 154 |
-
| Seed | 42 |
|
| 155 |
-
| PDG | no_path_dropg |
|
| 156 |
-
| Batch size (timing) | 8 |
|
| 157 |
|
| 158 |
-
|
| 159 |
|
| 160 |
-
|
| 161 |
|
| 162 |
| Metric | mdiffae_v1 (1 step) | Flux.1 VAE | Flux.2 VAE |
|
| 163 |
|--------|--------|--------|--------|
|
|
|
|
| 138 |
|
| 139 |
## 7. Results
|
| 140 |
|
| 141 |
+
Reconstruction quality evaluated on two image sets: a large benchmark (N=2000, 2/3 photographs + 1/3 book covers) for summary statistics, and a curated 39-image set for per-image comparisons. Flux.1 and Flux.2 VAEs are included as references. All models use 1-step DDIM, seed 42, no PDG, bfloat16.
|
| 142 |
|
| 143 |
+
### 7.1 Summary PSNR (N=2000 images)
|
| 144 |
|
| 145 |
+
| Model | Mean PSNR (dB) | Std (dB) | Median (dB) |
|
| 146 |
+
|-------|---------------|----------|-------------|
|
| 147 |
+
| mDiffAE v1 (1 step) | 34.15 | 5.14 | 33.82 |
|
| 148 |
+
| Flux.1 VAE | 34.62 | 4.31 | 35.17 |
|
| 149 |
+
| Flux.2 VAE | 36.30 | 4.58 | 36.14 |
|
| 150 |
+
|
| 151 |
+
**Percentile distribution:**
|
| 152 |
+
|
| 153 |
+
| Percentile | mDiffAE v1 | Flux.1 VAE | Flux.2 VAE |
|
| 154 |
+
|------------|-----------|------------|------------|
|
| 155 |
+
| p5 | 26.22 | 27.06 | 28.99 |
|
| 156 |
+
| p10 | 27.54 | 28.45 | 30.38 |
|
| 157 |
+
| p25 | 30.22 | 31.58 | 32.87 |
|
| 158 |
+
| p50 | 33.82 | 35.17 | 36.14 |
|
| 159 |
+
| p75 | 38.20 | 37.99 | 39.85 |
|
| 160 |
+
| p90 | 41.21 | 39.75 | 42.51 |
|
| 161 |
+
| p95 | 42.49 | 40.57 | 43.64 |
|
| 162 |
|
| 163 |
+
> mDiffAE encodes at ~2.5 ms/image and decodes at ~4 ms/image — roughly 25× faster than Flux.1 and 15× faster than Flux.2. Timings measured on an NVIDIA RTX Pro 6000 (Blackwell).
|
| 164 |
|
| 165 |
+
### 7.2 Interactive Viewer
|
| 166 |
+
|
| 167 |
+
**[Open full-resolution comparison viewer](https://huggingface.co/spaces/data-archetype/mdiffae-results)** — side-by-side reconstructions, RGB deltas, and latent PCA with adjustable image size.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
+
### 7.3 Per-Image Results (39-image curated set)
|
| 170 |
|
| 171 |
+
Inference settings: 1-step DDIM, seed 42, no PDG, batch size 8.
|
| 172 |
|
| 173 |
| Metric | mdiffae_v1 (1 step) | Flux.1 VAE | Flux.2 VAE |
|
| 174 |
|--------|--------|--------|--------|
|