Initial: q4 single-file fork (from ipsilondev) with config.json + tokenizer.json fixes

171d2d6 verified 8 days ago

3.27 kB

	---
	license: mit
	language:
	- ar
	- da
	- de
	- el
	- en
	- es
	- fi
	- fr
	- he
	- hi
	- it
	- ja
	- ko
	- ms
	- nl
	- 'no'
	- pl
	- pt
	- ru
	- sv
	- sw
	- tr
	- zh
	pipeline_tag: text-to-speech
	tags:
	- text-to-speech
	- speech
	- speech-generation
	- voice-cloning
	- multilingual-tts
	- onnx
	- quantized
	- q4
	- transformers.js
	library_name: transformers.js
	base_model:
	- onnx-community/chatterbox-multilingual-ONNX
	---

	# Chatterbox Multilingual TTS - Q4 Quantized ONNX

	Q4 weight-only quantized version of [onnx-community/chatterbox-multilingual-ONNX](https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX) for use with Transformers.js and ONNX Runtime Web.

	## Key Features

	- 75% smaller: 790 MB vs 3.2 GB original
	- Single-file ONNX: No external data files, compatible with Transformers.js
	- Same quality: Minimal quality loss from Q4 quantization
	- 23 languages supported: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh

	## Model Sizes

	\| Model \| Original (FP32) \| Q4 Quantized \|
	\|-------\|-----------------\|--------------\|
	\| speech_encoder.onnx \| 564 MB \| 172 MB \|
	\| embed_tokens.onnx \| 66 MB \| 65 MB \|
	\| language_model.onnx \| 2.0 GB \| 338 MB \|
	\| conditional_decoder.onnx \| 510 MB \| 215 MB \|
	\| Total \| 3.2 GB \| 790 MB \|

	## Usage

	### With ONNX Runtime (Python)

	```python
	import onnxruntime

	# Load Q4 models - single files, no external data needed
	speech_encoder = onnxruntime.InferenceSession("onnx/speech_encoder.onnx")
	embed_tokens = onnxruntime.InferenceSession("onnx/embed_tokens.onnx")
	language_model = onnxruntime.InferenceSession("onnx/language_model.onnx")
	conditional_decoder = onnxruntime.InferenceSession("onnx/conditional_decoder.onnx")
	```

	### With Transformers.js (JavaScript)

	```javascript
	// Models are single-file ONNX format, compatible with ONNX Runtime Web
	import { AutoTokenizer } from '@huggingface/transformers';

	const tokenizer = await AutoTokenizer.from_pretrained('ipsilondev/chatterbox-multilingual-ONNX-q4');
	```

	## Quantization Details

	- Method: Q4 weight-only quantization using `MatMulNBitsQuantizer`
	- Block size: 32
	- Symmetric: Yes
	- Format: Single-file ONNX (no external data) for web compatibility

	## Important Parameters

	When using these models, ensure you use the correct parameters:

	```python
	repetition_penalty = 1.2 # CRITICAL: Do NOT use 2.0 - causes infinite loops
	temperature = 0.8
	top_p = 0.95
	min_p = 0.05
	```

	## Supported Languages

	\| Code \| Language \| Code \| Language \|
	\|------\|----------\|------\|----------\|
	\| ar \| Arabic \| ko \| Korean \|
	\| da \| Danish \| ms \| Malay \|
	\| de \| German \| nl \| Dutch \|
	\| el \| Greek \| no \| Norwegian \|
	\| en \| English \| pl \| Polish \|
	\| es \| Spanish \| pt \| Portuguese \|
	\| fi \| Finnish \| ru \| Russian \|
	\| fr \| French \| sv \| Swedish \|
	\| he \| Hebrew \| sw \| Swahili \|
	\| hi \| Hindi \| tr \| Turkish \|
	\| it \| Italian \| zh \| Chinese \|
	\| ja \| Japanese \| \| \|

	## Credits

	- Original model: [onnx-community/chatterbox-multilingual-ONNX](https://huggingface.co/onnx-community/chatterbox-multilingual-ONNX)
	- Base model: [ResembleAI/chatterbox](https://github.com/resemble-ai/chatterbox)
	- Quantization by: [ipsilondev](https://huggingface.co/ipsilondev)

	## License

	MIT License (same as original model)