| license: mit | |
| base_model: ResembleAI/chatterbox-turbo-ONNX | |
| tags: | |
| - text-to-speech | |
| - tts | |
| - onnx | |
| - webgpu | |
| - transformers.js | |
| # Chatterbox Turbo - WebGPU Compatible | |
| This is a WebGPU-compatible version of [ResembleAI/chatterbox-turbo-ONNX](https://huggingface.co/ResembleAI/chatterbox-turbo-ONNX). | |
| ## Changes from Original | |
| The original model contains `int64` Cast operations and tensors that WebGPU cannot execute. | |
| This version converts all `int64` operations to `int32`, enabling direct WebGPU inference. | |
| ### Modifications Made: | |
| - **conditional_decoder**: 521 Cast nodes inserted (376 Shape/Range ops) | |
| - **speech_encoder**: 350 Cast nodes inserted (243 Shape/Range ops) | |
| - **language_model**: 3 Cast nodes inserted | |
| - **embed_tokens**: 1 Cast node inserted | |
| ## Usage with Transformers.js | |
| ```javascript | |
| import { AutoModel, AutoProcessor } from '@huggingface/transformers'; | |
| const model = await AutoModel.from_pretrained('spacekaren/chatterbox-turbo-webgpu', { | |
| device: 'webgpu', | |
| dtype: 'q4f16', | |
| }); | |
| const processor = await AutoProcessor.from_pretrained('spacekaren/chatterbox-turbo-webgpu'); | |
| ``` | |
| ## Model Size | |
| - **Total**: ~539 MB (q4f16 quantization) | |
| - Same architecture as original, just int64→int32 conversion | |
| ## License | |
| MIT (same as original) | |
| ## Credits | |
| - Original model: [ResembleAI/chatterbox-turbo-ONNX](https://huggingface.co/ResembleAI/chatterbox-turbo-ONNX) | |
| - Conversion script: [local.core/scripts/convert_int64_to_int32.py](https://github.com/anthropics/lama) | |