OmniVoice — Indic Quantized (INT4 + INT8)

Quantized version of k2-fsa/OmniVoice optimised for Indic languages (16 languages, Malayalam focus).

Size Comparison

Component	Original	Quantized	Method
Main model	2,450 MB	665 MB	INT4 group-wise RTN (group=128)
Audio tokenizer	806 MB	289 MB	INT8 per-channel symmetric
Total	3,256 MB	954 MB	70.7% reduction

Quantization Details

Main model: All attention (Q/K/V/O) and FFN (gate/up/down) linear layers quantized to INT4. First 2 and last 2 transformer layers kept in float16.
Audio tokenizer: All Conv1d + Linear layers quantized to INT8.
Embedding table: Non-Indic token rows zeroed (Indic vocab only).

Files

model_int4.safetensors          Main diffusion model (INT4 + fp16 guards)
audio_tokenizer/
  model_int8.safetensors        Audio codec (INT8)
  config.json
quant_info.json                 Quantization metadata (layer map)
config.json
tokenizer.json
chat_template.jinja

Usage

Use the companion notebook to load with the INT4 dequant runtime patch. See Cell 7 of the upload notebook for the full loading + Gradio UI code.

Languages Supported

Malayalam · Hindi · Tamil · Telugu · Kannada · Bengali · Marathi · Gujarati · Punjabi · Odia · Urdu · Assamese · Nepali · Sinhala · Sanskrit · Maithili

Downloads last month: -

Model tree for siyah1/Olam_ttsV2

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

k2-fsa/OmniVoice

Finetuned

(37)

this model