MisoTTS 8B TorchAO INT4 Weight-Only Quantization
This repository contains a 4-bit TorchAO weight-only quantization of
MisoLabs/MisoTTS, packaged so it can be loaded without first materializing the full 32 GB F32 checkpoint.
- Base model:
MisoLabs/MisoTTS - Quantization: TorchAO
Int4WeightOnlyConfig(group_size=128) - Runtime format:
torch.savecheckpoint containing TorchAO quantized tensor subclasses - Tested GPU: RTX 3060 12 GB
- Tokenizer: upstream default
meta-llama/Llama-3.2-1B - Language: English, following the base model
No private prompt voice is included. Voice continuation/cloning requires user-supplied prompt audio and transcript.
Why this exists
The upstream MisoTTS checkpoint is large and the default loader materializes F32 weights. This quantized variant targets consumer GPUs around 12 GB VRAM. It has been smoke-tested locally on an RTX 3060 using short and longer expressive generations.
Install
Use Python 3.10 and the same dependency family as upstream MisoTTS. A practical setup is:
git clone https://huggingface.co/droyster/MisoTTS-8B-torchao-int4
cd MisoTTS-8B-torchao-int4
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
The loader uses upstream MisoTTS tokenizer behavior by default: meta-llama/Llama-3.2-1B. This requires a Hugging Face token/account that has access to Meta's Llama 3.2 tokenizer repo.
Quick smoke test
python scripts/smoke_test.py \
--repo-id droyster/MisoTTS-8B-torchao-int4 \
--output smoke.wav \
--disable-watermark
--disable-watermark is recommended on 12 GB GPUs for longer local evaluation runs because SilentCipher watermarking can add enough memory pressure to OOM.
Python usage
import torchaudio
from load_quantized import load_miso_8b_torchao_int4
# disable_watermark=True is useful on 12 GB GPUs for long generations.
generator = load_miso_8b_torchao_int4(
"droyster/MisoTTS-8B-torchao-int4",
device="cuda",
disable_watermark=True,
)
audio = generator.generate(
text="Hello from the four bit TorchAO quantized Miso TTS model.",
speaker=0,
context=[],
max_audio_length_ms=10_000,
temperature=0.8,
topk=40,
)
torchaudio.save("miso_int4.wav", audio.unsqueeze(0).cpu(), generator.sample_rate)
Prompted voice continuation
This repo does not include any voice prompt audio. To condition on a user-supplied voice, pass context segments exactly as upstream MisoTTS does:
import torchaudio
from generator import Segment
from load_quantized import load_miso_8b_torchao_int4
generator = load_miso_8b_torchao_int4("droyster/MisoTTS-8B-torchao-int4", device="cuda")
prompt_audio, sr = torchaudio.load("prompt.wav")
prompt_audio = prompt_audio.mean(dim=0)
if sr != generator.sample_rate:
prompt_audio = torchaudio.functional.resample(prompt_audio, sr, generator.sample_rate)
context = [Segment(
speaker=0,
text="Transcript of the prompt audio goes here.",
audio=prompt_audio,
)]
audio = generator.generate(
text="The next sentence to synthesize.",
speaker=0,
context=context,
max_audio_length_ms=10_000,
)
Known limitations
- Long generations can drift from a short voice prompt; use longer/better prompt context for stronger voice adherence.
- SilentCipher watermarking may OOM on 12 GB GPUs during longer generations; use
disable_watermark=Truefor local evaluation if needed. - This is a TorchAO/PyTorch runtime checkpoint, not GGUF/AWQ/GPTQ/EXL2.
- Because TorchAO quantized tensor subclasses are serialized with
torch.save, loading usesweights_only=False.
Reproducing the quantization
python scripts/export_int4.py \
--source MisoLabs/MisoTTS \
--output model_int4_torchao.pt \
--group-size 128
The exporter streams the upstream model.safetensors, quantizes linear weights one at a time on CUDA, and saves the resulting quantized state dict.
License
The upstream model is marked license: other and includes the Modified MIT License from Miso Labs/Kamino Learning, Inc. The original license text is included in this repository. This quantized checkpoint is a derivative of MisoLabs/MisoTTS; follow the upstream license terms.
- Downloads last month
- 65
Model tree for droyster/MisoTTS-8B-torchao-int4
Base model
MisoLabs/MisoTTS