Update README.md
Browse files
README.md
CHANGED
|
@@ -15,12 +15,12 @@ A Family of Versatile and State-Of-The-Art Video Tokenizers
|
|
| 15 |
|
| 16 |
<img src="./assets/radar.png" width="95%" alt="radar" align="center">
|
| 17 |
|
| 18 |
-
VidTok is a family of
|
| 19 |
-
* ⚡️ **
|
| 20 |
-
* 🔥 **Advanced
|
| 21 |
-
* 💥 **
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/619b7b1cab4c7b7f16a7d59e/4v2I2YAZJeWSnd7iqntGX.mp4"></video>
|
| 26 |
|
|
|
|
| 15 |
|
| 16 |
<img src="./assets/radar.png" width="95%" alt="radar" align="center">
|
| 17 |
|
| 18 |
+
VidTok is a cutting-edge family of video tokenizers that delivers state-of-the-art performance in both continuous and discrete tokenizations with various compression rates. VidTok incorporates several key advancements over existing approaches:
|
| 19 |
+
* ⚡️ **Efficient Architecture**. Separate spatial and temporal sampling reduces computational complexity without sacrificing quality.
|
| 20 |
+
* 🔥 **Advanced Quantization**. Finite Scalar Quantization (FSQ) addresses training instability and codebook collapse in discrete tokenization.
|
| 21 |
+
* 💥 **Enhanced Training**. A two-stage strategy—pre-training on low-res videos and fine-tuning on high-res—boosts efficiency. Reduced frame rates improve motion dynamics representation.
|
| 22 |
|
| 23 |
+
VidTok, trained on a large-scale video dataset, outperforms previous models across all metrics, including PSNR, SSIM, LPIPS, and FVD.
|
| 24 |
|
| 25 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/619b7b1cab4c7b7f16a7d59e/4v2I2YAZJeWSnd7iqntGX.mp4"></video>
|
| 26 |
|