data-archetype
/

mdiffae-v1

image-reconstruction

masked-autoencoder

Model card Files Files and versions

data-archetype commited on 7 days ago

Commit

05d1710

·

verified ·

1 Parent(s): 128cb34

Upload folder using huggingface_hub

Files changed (1) hide show

technical_report_mdiffae.md +25 -14

technical_report_mdiffae.md CHANGED Viewed

@@ -138,26 +138,37 @@ Training checkpoint: step 708,000 (EMA weights).
 ## 7. Results
-Reconstruction quality evaluated on a curated set of test images covering photographs, book covers, and documents. Flux.1 VAE (patch 8, 16 channels) is included as a reference at the same 12x compression ratio as the c64 variant.
-### 7.1 Interactive Viewer
-**[Open full-resolution comparison viewer](https://huggingface.co/spaces/data-archetype/mdiffae-results)** — side-by-side reconstructions, RGB deltas, and latent PCA with adjustable image size.
-### 7.2 Inference Settings
-| Setting | Value |
-|---------|-------|
-| Sampler | ddim |
-| Steps | 1 |
-| Schedule | linear |
-| Seed | 42 |
-| PDG | no_path_dropg |
-| Batch size (timing) | 8 |
-> All models run in bfloat16. Timings measured on an NVIDIA RTX Pro 6000 (Blackwell).
-### 7.3 Global Metrics
 | Metric | mdiffae_v1 (1 step) | Flux.1 VAE | Flux.2 VAE |
 |--------|--------|--------|--------|

 ## 7. Results
+Reconstruction quality evaluated on two image sets: a large benchmark (N=2000, 2/3 photographs + 1/3 book covers) for summary statistics, and a curated 39-image set for per-image comparisons. Flux.1 and Flux.2 VAEs are included as references. All models use 1-step DDIM, seed 42, no PDG, bfloat16.
+### 7.1 Summary PSNR (N=2000 images)
+| Model | Mean PSNR (dB) | Std (dB) | Median (dB) |
+|-------|---------------|----------|-------------|
+| mDiffAE v1 (1 step) | 34.15 | 5.14 | 33.82 |
+| Flux.1 VAE | 34.62 | 4.31 | 35.17 |
+| Flux.2 VAE | 36.30 | 4.58 | 36.14 |
+**Percentile distribution:**
+| Percentile | mDiffAE v1 | Flux.1 VAE | Flux.2 VAE |
+|------------|-----------|------------|------------|
+| p5 | 26.22 | 27.06 | 28.99 |
+| p10 | 27.54 | 28.45 | 30.38 |
+| p25 | 30.22 | 31.58 | 32.87 |
+| p50 | 33.82 | 35.17 | 36.14 |
+| p75 | 38.20 | 37.99 | 39.85 |
+| p90 | 41.21 | 39.75 | 42.51 |
+| p95 | 42.49 | 40.57 | 43.64 |
+> mDiffAE encodes at ~2.5 ms/image and decodes at ~4 ms/image — roughly 25× faster than Flux.1 and 15× faster than Flux.2. Timings measured on an NVIDIA RTX Pro 6000 (Blackwell).
+### 7.2 Interactive Viewer
+**[Open full-resolution comparison viewer](https://huggingface.co/spaces/data-archetype/mdiffae-results)** — side-by-side reconstructions, RGB deltas, and latent PCA with adjustable image size.
+### 7.3 Per-Image Results (39-image curated set)
+Inference settings: 1-step DDIM, seed 42, no PDG, batch size 8.
 | Metric | mdiffae_v1 (1 step) | Flux.1 VAE | Flux.2 VAE |
 |--------|--------|--------|--------|