--- license: mit language: - ar - da - de - el - en - es - fi - fr - he - hi - it - ja - ko - ms - nl - 'no' - pl - pt - ru - sv - sw - tr - zh pipeline_tag: text-to-speech tags: - text-to-speech - speech - speech-generation - voice-cloning - multilingual-tts - onnx - quantized - q4 - transformers.js library_name: transformers.js base_model: - onnx-community/chatterbox-multilingual-ONNX --- # Chatterbox Multilingual TTS - Q4 Quantized ONNX Q4 weight-only quantized version of [onnx-community/chatterbox-multilingual-ONNX](https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX) for use with **Transformers.js** and **ONNX Runtime Web**. ## Key Features - **75% smaller**: 790 MB vs 3.2 GB original - **Single-file ONNX**: No external data files, compatible with Transformers.js - **Same quality**: Minimal quality loss from Q4 quantization - **23 languages supported**: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh ## Model Sizes | Model | Original (FP32) | Q4 Quantized | |-------|-----------------|--------------| | speech_encoder.onnx | 564 MB | 172 MB | | embed_tokens.onnx | 66 MB | 65 MB | | language_model.onnx | 2.0 GB | 338 MB | | conditional_decoder.onnx | 510 MB | 215 MB | | **Total** | **3.2 GB** | **790 MB** | ## Usage ### With ONNX Runtime (Python) ```python import onnxruntime # Load Q4 models - single files, no external data needed speech_encoder = onnxruntime.InferenceSession("onnx/speech_encoder.onnx") embed_tokens = onnxruntime.InferenceSession("onnx/embed_tokens.onnx") language_model = onnxruntime.InferenceSession("onnx/language_model.onnx") conditional_decoder = onnxruntime.InferenceSession("onnx/conditional_decoder.onnx") ``` ### With Transformers.js (JavaScript) ```javascript // Models are single-file ONNX format, compatible with ONNX Runtime Web import { AutoTokenizer } from '@huggingface/transformers'; const tokenizer = await AutoTokenizer.from_pretrained('ipsilondev/chatterbox-multilingual-ONNX-q4'); ``` ## Quantization Details - **Method**: Q4 weight-only quantization using `MatMulNBitsQuantizer` - **Block size**: 32 - **Symmetric**: Yes - **Format**: Single-file ONNX (no external data) for web compatibility ## Important Parameters When using these models, ensure you use the correct parameters: ```python repetition_penalty = 1.2 # CRITICAL: Do NOT use 2.0 - causes infinite loops temperature = 0.8 top_p = 0.95 min_p = 0.05 ``` ## Supported Languages | Code | Language | Code | Language | |------|----------|------|----------| | ar | Arabic | ko | Korean | | da | Danish | ms | Malay | | de | German | nl | Dutch | | el | Greek | no | Norwegian | | en | English | pl | Polish | | es | Spanish | pt | Portuguese | | fi | Finnish | ru | Russian | | fr | French | sv | Swedish | | he | Hebrew | sw | Swahili | | hi | Hindi | tr | Turkish | | it | Italian | zh | Chinese | | ja | Japanese | | | ## Credits - Original model: [onnx-community/chatterbox-multilingual-ONNX](https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX) - Base model: [ResembleAI/chatterbox](https://github.com/resemble-ai/chatterbox) - Quantization by: [ipsilondev](https://huggingface.co/ipsilondev) ## License MIT License (same as original model)