OpenAudio S1-mini INT8 Quantized

INT8 weight-only quantized version of fishaudio/openaudio-s1-mini for efficient GPU inference.

Model Size Comparison

Model	Original	INT8	Reduction
LLaMA (model.pth)	1.64 GB	1.02 GB	-38%
Codec (codec_int8.pth)	1.74 GB	0.91 GB	-48%
Total	3.38 GB	1.93 GB	-43%

Performance

RTF (Real-Time Factor): ~1.9x with reference caching
Tested on RTX 3090
Quality comparable to original FP16/BF16 model

Usage

from voice_clone_tts import VoiceCloneTTS

tts = VoiceCloneTTS(
    llama_checkpoint_path="ORI-Muchim/openaudio-s1-mini-int8",
    decoder_checkpoint_path="ORI-Muchim/openaudio-s1-mini-int8",
)

audio, sr = tts.synthesize(
    text="Hello, this is a test.",
    reference_audio="reference.wav",  # Optional: for voice cloning
)

Files

model.pth - INT8 quantized LLaMA model (1.02 GB)
codec_int8.pth - INT8 quantized DAC codec (0.91 GB)
config.json - Model configuration
tokenizer.tiktoken - Tokenizer
special_tokens.json - Special tokens

Quantization Method

Weight-only INT8 quantization with per-channel scales:

Weights stored as INT8
Scales stored as BF16
Activations remain in FP16/BF16

Credits

Original model: Fish Audio / fishaudio/openaudio-s1-mini
Quantization: ORI-Muchim

License

CC-BY-NC-SA-4.0 (Non-commercial use only)

See the original model for full license terms.

Downloads last month: 48

Model tree for ORI-Muchim/openaudio-s1-mini-int8

Base model

fishaudio/s1-mini

Finetuned

(2)

this model