Chatterbox Multilingual TTS - Q4 Quantized ONNX

Q4 weight-only quantized version of onnx-community/chatterbox-multilingual-ONNX for use with Transformers.js and ONNX Runtime Web.

Key Features

  • 75% smaller: 790 MB vs 3.2 GB original
  • Single-file ONNX: No external data files, compatible with Transformers.js
  • Same quality: Minimal quality loss from Q4 quantization
  • 23 languages supported: ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh

Model Sizes

Model Original (FP32) Q4 Quantized
speech_encoder.onnx 564 MB 172 MB
embed_tokens.onnx 66 MB 65 MB
language_model.onnx 2.0 GB 338 MB
conditional_decoder.onnx 510 MB 215 MB
Total 3.2 GB 790 MB

Usage

With ONNX Runtime (Python)

import onnxruntime

# Load Q4 models - single files, no external data needed
speech_encoder = onnxruntime.InferenceSession("onnx/speech_encoder.onnx")
embed_tokens = onnxruntime.InferenceSession("onnx/embed_tokens.onnx")
language_model = onnxruntime.InferenceSession("onnx/language_model.onnx")
conditional_decoder = onnxruntime.InferenceSession("onnx/conditional_decoder.onnx")

With Transformers.js (JavaScript)

// Models are single-file ONNX format, compatible with ONNX Runtime Web
import { AutoTokenizer } from '@huggingface/transformers';

const tokenizer = await AutoTokenizer.from_pretrained('ipsilondev/chatterbox-multilingual-ONNX-q4');

Quantization Details

  • Method: Q4 weight-only quantization using MatMulNBitsQuantizer
  • Block size: 32
  • Symmetric: Yes
  • Format: Single-file ONNX (no external data) for web compatibility

Important Parameters

When using these models, ensure you use the correct parameters:

repetition_penalty = 1.2  # CRITICAL: Do NOT use 2.0 - causes infinite loops
temperature = 0.8
top_p = 0.95
min_p = 0.05

Supported Languages

Code Language Code Language
ar Arabic ko Korean
da Danish ms Malay
de German nl Dutch
el Greek no Norwegian
en English pl Polish
es Spanish pt Portuguese
fi Finnish ru Russian
fr French sv Swedish
he Hebrew sw Swahili
hi Hindi tr Turkish
it Italian zh Chinese
ja Japanese

Credits

License

MIT License (same as original model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ipsilondev/chatterbox-multilingual-ONNX-q4

Quantized
(1)
this model