hans00's picture
Initial: q4 single-file fork (from ipsilondev) with config.json + tokenizer.json fixes
171d2d6 verified
---
license: mit
language:
- ar
- da
- de
- el
- en
- es
- fi
- fr
- he
- hi
- it
- ja
- ko
- ms
- nl
- 'no'
- pl
- pt
- ru
- sv
- sw
- tr
- zh
pipeline_tag: text-to-speech
tags:
- text-to-speech
- speech
- speech-generation
- voice-cloning
- multilingual-tts
- onnx
- quantized
- q4
- transformers.js
library_name: transformers.js
base_model:
- onnx-community/chatterbox-multilingual-ONNX
---
# Chatterbox Multilingual TTS - Q4 Quantized ONNX
Q4 weight-only quantized version of [onnx-community/chatterbox-multilingual-ONNX](https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX) for use with **Transformers.js** and **ONNX Runtime Web**.
## Key Features
- **75% smaller**: 790 MB vs 3.2 GB original
- **Single-file ONNX**: No external data files, compatible with Transformers.js
- **Same quality**: Minimal quality loss from Q4 quantization
- **23 languages supported**: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh
## Model Sizes
| Model | Original (FP32) | Q4 Quantized |
|-------|-----------------|--------------|
| speech_encoder.onnx | 564 MB | 172 MB |
| embed_tokens.onnx | 66 MB | 65 MB |
| language_model.onnx | 2.0 GB | 338 MB |
| conditional_decoder.onnx | 510 MB | 215 MB |
| **Total** | **3.2 GB** | **790 MB** |
## Usage
### With ONNX Runtime (Python)
```python
import onnxruntime
# Load Q4 models - single files, no external data needed
speech_encoder = onnxruntime.InferenceSession("onnx/speech_encoder.onnx")
embed_tokens = onnxruntime.InferenceSession("onnx/embed_tokens.onnx")
language_model = onnxruntime.InferenceSession("onnx/language_model.onnx")
conditional_decoder = onnxruntime.InferenceSession("onnx/conditional_decoder.onnx")
```
### With Transformers.js (JavaScript)
```javascript
// Models are single-file ONNX format, compatible with ONNX Runtime Web
import { AutoTokenizer } from '@huggingface/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('ipsilondev/chatterbox-multilingual-ONNX-q4');
```
## Quantization Details
- **Method**: Q4 weight-only quantization using `MatMulNBitsQuantizer`
- **Block size**: 32
- **Symmetric**: Yes
- **Format**: Single-file ONNX (no external data) for web compatibility
## Important Parameters
When using these models, ensure you use the correct parameters:
```python
repetition_penalty = 1.2 # CRITICAL: Do NOT use 2.0 - causes infinite loops
temperature = 0.8
top_p = 0.95
min_p = 0.05
```
## Supported Languages
| Code | Language | Code | Language |
|------|----------|------|----------|
| ar | Arabic | ko | Korean |
| da | Danish | ms | Malay |
| de | German | nl | Dutch |
| el | Greek | no | Norwegian |
| en | English | pl | Polish |
| es | Spanish | pt | Portuguese |
| fi | Finnish | ru | Russian |
| fr | French | sv | Swedish |
| he | Hebrew | sw | Swahili |
| hi | Hindi | tr | Turkish |
| it | Italian | zh | Chinese |
| ja | Japanese | | |
## Credits
- Original model: [onnx-community/chatterbox-multilingual-ONNX](https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX)
- Base model: [ResembleAI/chatterbox](https://github.com/resemble-ai/chatterbox)
- Quantization by: [ipsilondev](https://huggingface.co/ipsilondev)
## License
MIT License (same as original model)