OmniVoice — Indic Quantized (INT4 + INT8)
Quantized version of k2-fsa/OmniVoice optimised for Indic languages (16 languages, Malayalam focus).
Size Comparison
| Component | Original | Quantized | Method |
|---|---|---|---|
| Main model | 2,450 MB | 665 MB | INT4 group-wise RTN (group=128) |
| Audio tokenizer | 806 MB | 289 MB | INT8 per-channel symmetric |
| Total | 3,256 MB | 954 MB | 70.7% reduction |
Quantization Details
- Main model: All attention (Q/K/V/O) and FFN (gate/up/down) linear layers quantized to INT4. First 2 and last 2 transformer layers kept in float16.
- Audio tokenizer: All Conv1d + Linear layers quantized to INT8.
- Embedding table: Non-Indic token rows zeroed (Indic vocab only).
Files
model_int4.safetensors Main diffusion model (INT4 + fp16 guards)
audio_tokenizer/
model_int8.safetensors Audio codec (INT8)
config.json
quant_info.json Quantization metadata (layer map)
config.json
tokenizer.json
chat_template.jinja
Usage
Use the companion notebook to load with the INT4 dequant runtime patch. See Cell 7 of the upload notebook for the full loading + Gradio UI code.
Languages Supported
Malayalam · Hindi · Tamil · Telugu · Kannada · Bengali · Marathi · Gujarati · Punjabi · Odia · Urdu · Assamese · Nepali · Sinhala · Sanskrit · Maithili
- Downloads last month
- -