Improve model card with pipeline tag, library name, and updated links
#3
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- audio-feature-extraction
|
| 5 |
- speech-language-models
|
|
@@ -9,9 +11,8 @@ tags:
|
|
| 9 |
- text-to-speech
|
| 10 |
- automatic-speech-recognition
|
| 11 |
---
|
| 12 |
-
# WavTokenizer: SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
|
| 13 |
-
|
| 14 |
|
|
|
|
| 15 |
|
| 16 |
[](https://arxiv.org/abs/2408.16532)
|
| 17 |
[](https://wavtokenizer.github.io/)
|
|
@@ -21,10 +22,13 @@ tags:
|
|
| 21 |
|
| 22 |
### ππ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens per second!
|
| 23 |
### ππ with WavTokenizer, You can get strong reconstruction results.
|
| 24 |
-
### ππ WavTokenizer owns rich semantic information and is build for audio language models such as
|
| 25 |
|
| 26 |
# π₯ News
|
| 27 |
-
- *
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |

|
| 30 |
|
|
@@ -112,10 +116,9 @@ audio_out = wavtokenizer.decode(features, bandwidth_id=bandwidth_id)
|
|
| 112 |
|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
|
| 113 |
| WavTokenizer-small-600-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_600_24k_4096.ckpt) | LibriTTS | 40 | Speech | β |
|
| 114 |
| WavTokenizer-small-320-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_320_24k_4096.ckpt) | LibriTTS | 75 | Speech | β|
|
| 115 |
-
| WavTokenizer-medium-
|
| 116 |
-
| WavTokenizer-
|
| 117 |
-
| WavTokenizer-large-
|
| 118 |
-
| WavTokenizer-large-320-24k-4096 | [π€](https://github.com/jishengpeng/wavtokenizer) | 80000 Hours | 75 | Speech, Audio, Music | Coming Soon |
|
| 119 |
|
| 120 |
|
| 121 |
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
library_name: torch
|
| 4 |
+
pipeline_tag: audio-to-audio
|
| 5 |
tags:
|
| 6 |
- audio-feature-extraction
|
| 7 |
- speech-language-models
|
|
|
|
| 11 |
- text-to-speech
|
| 12 |
- automatic-speech-recognition
|
| 13 |
---
|
|
|
|
|
|
|
| 14 |
|
| 15 |
+
# WavTokenizer: SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
|
| 16 |
|
| 17 |
[](https://arxiv.org/abs/2408.16532)
|
| 18 |
[](https://wavtokenizer.github.io/)
|
|
|
|
| 22 |
|
| 23 |
### ππ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens per second!
|
| 24 |
### ππ with WavTokenizer, You can get strong reconstruction results.
|
| 25 |
+
### ππ WavTokenizer owns rich semantic information and is build for audio language models such as GPT-4o.
|
| 26 |
|
| 27 |
# π₯ News
|
| 28 |
+
- *2025.02.25*: We update WavTokenizer camera ready version for ICLR 2025 and update WavTokenizer-large-v2 checkpoint on [huggingface](https://huggingface.co/novateur/WavTokenizer-large-speech-75token).
|
| 29 |
+
- *2024.10.22*: We update WavTokenizer on arxiv and release WavTokenizer-Large checkpoint.
|
| 30 |
+
- *2024.09.09*: We release WavTokenizer-medium checkpoint on [huggingface](https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0).
|
| 31 |
+
- *2024.08.31*: We release WavTokenizer on arxiv.
|
| 32 |
|
| 33 |

|
| 34 |
|
|
|
|
| 116 |
|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
|
| 117 |
| WavTokenizer-small-600-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_600_24k_4096.ckpt) | LibriTTS | 40 | Speech | β |
|
| 118 |
| WavTokenizer-small-320-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_320_24k_4096.ckpt) | LibriTTS | 75 | Speech | β|
|
| 119 |
+
| WavTokenizer-medium-320-24k-4096 | [π€](https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0) | 10000 Hours | 75 | Speech, Audio, Music | β |
|
| 120 |
+
| WavTokenizer-large-600-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer-large-unify-40token) | 80000 Hours | 40 | Speech, Audio, Music | β|
|
| 121 |
+
| WavTokenizer-large-320-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer-large-speech-75token) | 80000 Hours | 75 | Speech, Audio, Music | β |
|
|
|
|
| 122 |
|
| 123 |
|
| 124 |
|