OmniVoice — Indic Quantized (INT4 + INT8)

Quantized version of k2-fsa/OmniVoice optimised for Indic languages (16 languages, Malayalam focus).

Size Comparison

Component Original Quantized Method
Main model 2,450 MB 665 MB INT4 group-wise RTN (group=128)
Audio tokenizer 806 MB 289 MB INT8 per-channel symmetric
Total 3,256 MB 954 MB 70.7% reduction

Quantization Details

  • Main model: All attention (Q/K/V/O) and FFN (gate/up/down) linear layers quantized to INT4. First 2 and last 2 transformer layers kept in float16.
  • Audio tokenizer: All Conv1d + Linear layers quantized to INT8.
  • Embedding table: Non-Indic token rows zeroed (Indic vocab only).

Files

model_int4.safetensors          Main diffusion model (INT4 + fp16 guards)
audio_tokenizer/
  model_int8.safetensors        Audio codec (INT8)
  config.json
quant_info.json                 Quantization metadata (layer map)
config.json
tokenizer.json
chat_template.jinja

Usage

Use the companion notebook to load with the INT4 dequant runtime patch. See Cell 7 of the upload notebook for the full loading + Gradio UI code.

Languages Supported

Malayalam · Hindi · Tamil · Telugu · Kannada · Bengali · Marathi · Gujarati · Punjabi · Odia · Urdu · Assamese · Nepali · Sinhala · Sanskrit · Maithili

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for siyah1/Olam_ttsV2

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(37)
this model