metadata
library_name: KVAE 3D
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
- vae
license: mit
KVAE-3D 1.0: Video tokenizer
KVAE-3D model has time compression 4, spacial compression 8x8 and 16 latent channels
Evaluation results
Reconstructions comparison of KVAE-3D and Hunyuan:
Evaluation results of KVAE-3D model on MCL-JCV dataset. All compared models perform 4x8x8 compression with 16 latent channels:
| Model | PSNR | SSIM | LPIPS |
|---|---|---|---|
| Wan-2.1 | 33.75 | 0.90 | 0.089 |
| HunyuanVideo | 33.91 | 0.91 | 0.103 |
| KVAE-3D | 35.63 | 0.92 | 0.088 |
