MisoTTS 8B TorchAO INT4 Weight-Only Quantization

This repository contains a 4-bit TorchAO weight-only quantization of MisoLabs/MisoTTS, packaged so it can be loaded without first materializing the full 32 GB F32 checkpoint.

Base model: MisoLabs/MisoTTS
Quantization: TorchAO Int4WeightOnlyConfig(group_size=128)
Runtime format: torch.save checkpoint containing TorchAO quantized tensor subclasses
Tested GPU: RTX 3060 12 GB
Tokenizer: upstream default meta-llama/Llama-3.2-1B
Language: English, following the base model

No private prompt voice is included. Voice continuation/cloning requires user-supplied prompt audio and transcript.

Why this exists

The upstream MisoTTS checkpoint is large and the default loader materializes F32 weights. This quantized variant targets consumer GPUs around 12 GB VRAM. It has been smoke-tested locally on an RTX 3060 using short and longer expressive generations.

Install

Use Python 3.10 and the same dependency family as upstream MisoTTS. A practical setup is:

git clone https://huggingface.co/droyster/MisoTTS-8B-torchao-int4
cd MisoTTS-8B-torchao-int4
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

The loader uses upstream MisoTTS tokenizer behavior by default: meta-llama/Llama-3.2-1B. This requires a Hugging Face token/account that has access to Meta's Llama 3.2 tokenizer repo.

Quick smoke test

python scripts/smoke_test.py \
  --repo-id droyster/MisoTTS-8B-torchao-int4 \
  --output smoke.wav \
  --disable-watermark

--disable-watermark is recommended on 12 GB GPUs for longer local evaluation runs because SilentCipher watermarking can add enough memory pressure to OOM.

Python usage

import torchaudio
from load_quantized import load_miso_8b_torchao_int4

# disable_watermark=True is useful on 12 GB GPUs for long generations.
generator = load_miso_8b_torchao_int4(
    "droyster/MisoTTS-8B-torchao-int4",
    device="cuda",
    disable_watermark=True,
)

audio = generator.generate(
    text="Hello from the four bit TorchAO quantized Miso TTS model.",
    speaker=0,
    context=[],
    max_audio_length_ms=10_000,
    temperature=0.8,
    topk=40,
)

torchaudio.save("miso_int4.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)

Prompted voice continuation

This repo does not include any voice prompt audio. To condition on a user-supplied voice, pass context segments exactly as upstream MisoTTS does:

import torchaudio
from generator import Segment
from load_quantized import load_miso_8b_torchao_int4

generator = load_miso_8b_torchao_int4("droyster/MisoTTS-8B-torchao-int4", device="cuda")

prompt_audio, sr = torchaudio.load("prompt.wav")
prompt_audio = prompt_audio.mean(dim=0)
if sr != generator.sample_rate:
    prompt_audio = torchaudio.functional.resample(prompt_audio, sr, generator.sample_rate)

context = [Segment(
    speaker=0,
    text="Transcript of the prompt audio goes here.",
    audio=prompt_audio,
)]

audio = generator.generate(
    text="The next sentence to synthesize.",
    speaker=0,
    context=context,
    max_audio_length_ms=10_000,
)

Known limitations

Long generations can drift from a short voice prompt; use longer/better prompt context for stronger voice adherence.
SilentCipher watermarking may OOM on 12 GB GPUs during longer generations; use disable_watermark=True for local evaluation if needed.
This is a TorchAO/PyTorch runtime checkpoint, not GGUF/AWQ/GPTQ/EXL2.
Because TorchAO quantized tensor subclasses are serialized with torch.save, loading uses weights_only=False.

Reproducing the quantization

python scripts/export_int4.py \
  --source MisoLabs/MisoTTS \
  --output model_int4_torchao.pt \
  --group-size 128

The exporter streams the upstream model.safetensors, quantizes linear weights one at a time on CUDA, and saves the resulting quantized state dict.

License

The upstream model is marked license: other and includes the Modified MIT License from Miso Labs/Kamino Learning, Inc. The original license text is included in this repository. This quantized checkpoint is a derivative of MisoLabs/MisoTTS; follow the upstream license terms.

Downloads last month: 18

Model tree for droyster/MisoTTS-8B-torchao-int4

Base model

MisoLabs/MisoTTS

Quantized

(1)

this model