data-archetype commited on
Commit
05d1710
·
verified ·
1 Parent(s): 128cb34

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. technical_report_mdiffae.md +25 -14
technical_report_mdiffae.md CHANGED
@@ -138,26 +138,37 @@ Training checkpoint: step 708,000 (EMA weights).
138
 
139
  ## 7. Results
140
 
141
- Reconstruction quality evaluated on a curated set of test images covering photographs, book covers, and documents. Flux.1 VAE (patch 8, 16 channels) is included as a reference at the same 12x compression ratio as the c64 variant.
142
 
143
- ### 7.1 Interactive Viewer
144
 
145
- **[Open full-resolution comparison viewer](https://huggingface.co/spaces/data-archetype/mdiffae-results)** side-by-side reconstructions, RGB deltas, and latent PCA with adjustable image size.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
 
147
- ### 7.2 Inference Settings
148
 
149
- | Setting | Value |
150
- |---------|-------|
151
- | Sampler | ddim |
152
- | Steps | 1 |
153
- | Schedule | linear |
154
- | Seed | 42 |
155
- | PDG | no_path_dropg |
156
- | Batch size (timing) | 8 |
157
 
158
- > All models run in bfloat16. Timings measured on an NVIDIA RTX Pro 6000 (Blackwell).
159
 
160
- ### 7.3 Global Metrics
161
 
162
  | Metric | mdiffae_v1 (1 step) | Flux.1 VAE | Flux.2 VAE |
163
  |--------|--------|--------|--------|
 
138
 
139
  ## 7. Results
140
 
141
+ Reconstruction quality evaluated on two image sets: a large benchmark (N=2000, 2/3 photographs + 1/3 book covers) for summary statistics, and a curated 39-image set for per-image comparisons. Flux.1 and Flux.2 VAEs are included as references. All models use 1-step DDIM, seed 42, no PDG, bfloat16.
142
 
143
+ ### 7.1 Summary PSNR (N=2000 images)
144
 
145
+ | Model | Mean PSNR (dB) | Std (dB) | Median (dB) |
146
+ |-------|---------------|----------|-------------|
147
+ | mDiffAE v1 (1 step) | 34.15 | 5.14 | 33.82 |
148
+ | Flux.1 VAE | 34.62 | 4.31 | 35.17 |
149
+ | Flux.2 VAE | 36.30 | 4.58 | 36.14 |
150
+
151
+ **Percentile distribution:**
152
+
153
+ | Percentile | mDiffAE v1 | Flux.1 VAE | Flux.2 VAE |
154
+ |------------|-----------|------------|------------|
155
+ | p5 | 26.22 | 27.06 | 28.99 |
156
+ | p10 | 27.54 | 28.45 | 30.38 |
157
+ | p25 | 30.22 | 31.58 | 32.87 |
158
+ | p50 | 33.82 | 35.17 | 36.14 |
159
+ | p75 | 38.20 | 37.99 | 39.85 |
160
+ | p90 | 41.21 | 39.75 | 42.51 |
161
+ | p95 | 42.49 | 40.57 | 43.64 |
162
 
163
+ > mDiffAE encodes at ~2.5 ms/image and decodes at ~4 ms/image — roughly 25× faster than Flux.1 and 15× faster than Flux.2. Timings measured on an NVIDIA RTX Pro 6000 (Blackwell).
164
 
165
+ ### 7.2 Interactive Viewer
166
+
167
+ **[Open full-resolution comparison viewer](https://huggingface.co/spaces/data-archetype/mdiffae-results)** — side-by-side reconstructions, RGB deltas, and latent PCA with adjustable image size.
 
 
 
 
 
168
 
169
+ ### 7.3 Per-Image Results (39-image curated set)
170
 
171
+ Inference settings: 1-step DDIM, seed 42, no PDG, batch size 8.
172
 
173
  | Metric | mdiffae_v1 (1 step) | Flux.1 VAE | Flux.2 VAE |
174
  |--------|--------|--------|--------|