--- license: cc-by-sa-3.0 language: - en --- ## Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning with Differentiable K-Means [[paper](https://arxiv.org/abs/2601.19781)] [[demo](https://ondatk68.github.io/onda-demo/projects/phonological-tokenizer/)] ![arch](./arch.png) Phonological Tokenizer is a single-codebook speech tokenizer that encodes linguistic and prosodic information. The tokenizer has intermediate properties between phonetic tokens and acoustic tokens. This tokenizer is obtained by fine-tuning the phonetic tokens derived from an SSL model (wavlm-large) using differentiable k-means in a multi-task manner with ASR and speech reconstruction. In this repository, we release the fine-tuned SSL model and cluster centroids, along with simple inference code. For more details, please refer to [our paper](https://arxiv.org/abs/2601.19781). ### Usage ``` git clone https://huggingface.co/Sony/Phonological-Tokenizer cd Phonological-Tokenizer pip install -r requirements.txt python inference.py [audio file path] ``` ### License This model is licensed under CC BY-SA 3.0. See the [LICENSE file](./LICENSE) for details. ### Citation ``` @inproceedings{onda2026phonological, title={Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning with Differentiable K-Means}, author={Onda, Kentaro and Futami, Hayato and Kashiwagi, Yosuke and Tsunoo, Emiru and Watanabe, Shinji}, booktitle={ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={17817-17821}, year={2026}, organization={IEEE}, doi={10.1109/ICASSP55912.2026.11464405} } ``` ### Reference - Original SSL model: [WavLM-large](https://huggingface.co/microsoft/wavlm-large) (CC BY-SA 3.0) - Training data: - [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) (CC BY 4.0) - [LibriSpeech](https://www.openslr.org/12) (CC BY 4.0; used a 30h random subset of train-clean-100 for centroid initialization) ### Contact ondakentaro[at]gavo.t.u-tokyo.ac.jp; hayato.Futami[at]sony.com; Yosuke.Kashiwagi[at]sony.com