frothywater commited on
Commit
16c9a61
·
verified ·
1 Parent(s): af8defb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -9,6 +9,7 @@ tags:
9
 
10
  # Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling
11
 
 
12
  [Code](https://github.com/frothywater/kanade-tokenizer)  
13
  [Demo](https://frothywater.github.io/kanade-tokenizer/)  
14
 
@@ -16,7 +17,7 @@ Kanade is a single-layer disentangled speech tokenizer that extracts compact tok
16
 
17
  ## Updates
18
 
19
- - **2026-01-09**: Released `kanade-25hz-clean` model trained on [LibriTTS-R](https://arxiv.org/abs/2305.18802) with [HiFT vocoder](https://arxiv.org/abs/2309.09493) for better audio quality. LibriTTS-R is a restored version of LibriTTS removing noise, so the model trained on it can produce cleaner synthesis. Because of that, however, this version can no longer faithfully reflect the recording environment such as background noise and microphone characteristics. Also, the vocoder is changed to the HiFT model used in [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) for better quality. The content encoder part remains the same as the previous `kanade-25hz` model. We made tiny code change to support different vocoders during inference (specifically `load_vocoder`). Please refer to the updated usage section below.
20
 
21
  ## Models
22
 
 
9
 
10
  # Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling
11
 
12
+ [Paper](https://arxiv.org/abs/2602.00594)  
13
  [Code](https://github.com/frothywater/kanade-tokenizer)  
14
  [Demo](https://frothywater.github.io/kanade-tokenizer/)  
15
 
 
17
 
18
  ## Updates
19
 
20
+ - **2026-01-09**: Released `kanade-25hz-clean` model trained on LibriTTS-R with HiFT vocoder for better audio quality. LibriTTS-R is a restored version of LibriTTS removing noise, so the model trained on it can produce cleaner synthesis. Because of that, however, this version can no longer faithfully reflect the recording environment such as background noise and microphone characteristics. Also, the vocoder is changed to the HiFT model used in [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B) for better quality. The content encoder part remains the same as the previous `kanade-25hz` model. We made tiny code change to support different vocoders during inference (specifically `load_vocoder`). Please refer to the updated usage section below.
21
 
22
  ## Models
23