CosyVoice3 Speech Tokenizer PT Parameter Bundle

This repository stores a PyTorch checkpoint converted from ONNX initializers.

Source

Base/source model: FunAudioLLM/Fun-CosyVoice3-0.5B-2512
Conversion method: ONNX initializer tensors -> pytorch_model.bin

Important Notes

This repository supports trust_remote_code=True loading.
This is a parameter-store style model bundle, not a full original training architecture reconstruction.
configuration_onnx_parameter_store.py, modeling_onnx_parameter_store.py, config.json, onnx_parameter_map.json are required for AutoModel.from_pretrained(..., trust_remote_code=True).

Load Example

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "wookee3/cosyvoice3-speech-tokenizer-pt",
    trust_remote_code=True,
)
print(len(model.weights))

Tokenize via Loaded HF Model

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "wookee3/cosyvoice3-speech-tokenizer-pt",
    trust_remote_code=True,
)
# downloads speech_tokenizer_v3.onnx from source repo automatically
indices = model.tokenize_from_file("/path/to/audio.wav")
print(indices.shape)
print(indices[0].tolist()[:30])

Audio -> Token Sequence Example (ONNX Runtime)

This repo does not include ONNX file intentionally. The example script downloads speech_tokenizer_v3.onnx from source repo at runtime.

pip install numpy soundfile onnxruntime huggingface_hub
python audio_to_tokens_example.py --audio /path/to/audio.wav

Or specify a local ONNX path:

python audio_to_tokens_example.py \
  --audio /path/to/audio.wav \
  --onnx /path/to/speech_tokenizer_v3.onnx

Downloads last month: 6

Model tree for wookee3/cosyvoice3-speech-tokenizer-pt

Base model

FunAudioLLM/Fun-CosyVoice3-0.5B-2512

Finetuned

(9)

this model