frothywater commited on
Commit
70e31e8
·
verified ·
1 Parent(s): 5362103

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -11,12 +11,17 @@ tags:
11
 
12
  Kanade is a speech tokenizer that encodes speech into compact content tokens and global embeddings and decodes them back to mel spectrograms.
13
 
 
 
 
 
14
  ## Models
15
 
16
- | Model | Token Rate | Vocab Size | Bit Rate | Dataset | SSL Encoder | Vocoder | Parameters |
17
- | ------------------------------------------------------------------- | ---------- | ---------- | -------- | -------- | ----------- | ----------- | ---------- |
18
- | [`kanade-12.5hz`](https://huggingface.co/frothywater/kanade-12.5hz) | 12.5 Hz | 12800 | 171 bps | LibriTTS | WavLM-base+ | Vocos 24kHz | 120M |
19
- | [`kanade-25hz`](https://huggingface.co/frothywater/kanade-25hz) | 25 Hz | 12800 | 341 bps | LibriTTS | WavLM-base+ | Vocos 24kHz | 118M |
 
20
 
21
  ## Installation
22
 
@@ -41,11 +46,11 @@ Example code to load the model from HuggingFace Hub and run inference:
41
  from kanade_tokenizer import KanadeModel, load_audio, load_vocoder, vocode
42
 
43
  # Load Kanade model
44
- model = KanadeModel.from_pretrained("frothywater/kanade-12.5hz") # or "frothywater/kanade-25hz"
45
  model = model.eval().cuda()
46
 
47
  # Load vocoder
48
- vocoder = load_vocoder().cuda()
49
 
50
  # Load audio (samples,)
51
  audio = load_audio("path/to/audio.wav", sample_rate=model.config.sample_rate).cuda()
 
11
 
12
  Kanade is a speech tokenizer that encodes speech into compact content tokens and global embeddings and decodes them back to mel spectrograms.
13
 
14
+ ## Updates
15
+
16
+ - **2026-01-09**: Released `kanade-25hz-clean` model trained on [LibriTTS-R](https://arxiv.org/abs/2305.18802) with [HiFT vocoder](https://arxiv.org/abs/2309.09493) for better audio quality. LibriTTS-R is a restored version of LibriTTS removing noise, so the model trained on it can produce cleaner synthesis. Because of that, however, this version can no longer faithfully reflect the recording environment such as background noise and microphone characteristics. Also, the vocoder is changed to the HiFT model used in [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) for better quality. The content encoder part remains the same as the previous `kanade-25hz` model. We made tiny code change to support different vocoders during inference (specifically `load_vocoder`). Please refer to the updated usage section below.
17
+
18
  ## Models
19
 
20
+ | Model | Token Rate | Vocab Size | Bit Rate | Dataset | SSL Encoder | Vocoder | Parameters |
21
+ | --------------------------------------------------------------------------- | ---------- | ---------- | -------- | ---------- | ----------- | ----------- | ---------- |
22
+ | [`kanade-12.5hz`](https://huggingface.co/frothywater/kanade-12.5hz) | 12.5 Hz | 12800 | 171 bps | LibriTTS | WavLM-base+ | Vocos 24kHz | 120M |
23
+ | [`kanade-25hz`](https://huggingface.co/frothywater/kanade-25hz) | 25 Hz | 12800 | 341 bps | LibriTTS | WavLM-base+ | Vocos 24kHz | 118M |
24
+ | [`kanade-25hz-clean`](https://huggingface.co/frothywater/kanade-25hz-clean) | 25 Hz | 12800 | 341 bps | LibriTTS-R | WavLM-base+ | HiFT 24kHz | 142M |
25
 
26
  ## Installation
27
 
 
46
  from kanade_tokenizer import KanadeModel, load_audio, load_vocoder, vocode
47
 
48
  # Load Kanade model
49
+ model = KanadeModel.from_pretrained("frothywater/kanade-25hz")
50
  model = model.eval().cuda()
51
 
52
  # Load vocoder
53
+ vocoder = load_vocoder(model.config.vocoder_name).cuda()
54
 
55
  # Load audio (samples,)
56
  audio = load_audio("path/to/audio.wav", sample_rate=model.config.sample_rate).cuda()