KVAE-3D-1.0 / README.md

Update README.md

b4f8083 verified 5 months ago

1.63 kB

library_name: KVAE 3D
tags:
  - model_hub_mixin
  - pytorch_model_hub_mixin
  - vae
license: mit

KVAE-3D 1.0: Video tokenizer

KVAE-3D model has time compression 4, spacial compression 8x8 and 16 latent channels

Evaluation results

Reconstructions comparison of KVAE-3D and Hunyuan:

Evaluation results of KVAE-3D model on MCL-JCV dataset. All compared models perform 4x8x8 compression with 16 latent channels:

Model	PSNR	SSIM	LPIPS
Wan-2.1	33.75	0.90	0.089
HunyuanVideo	33.91	0.91	0.103
KVAE-3D	35.63	0.92	0.088