OpenAudio S1-mini INT8 Quantized

INT8 weight-only quantized version of fishaudio/openaudio-s1-mini for efficient GPU inference.

Model Size Comparison

Model Original INT8 Reduction
LLaMA (model.pth) 1.64 GB 1.02 GB -38%
Codec (codec_int8.pth) 1.74 GB 0.91 GB -48%
Total 3.38 GB 1.93 GB -43%

Performance

  • RTF (Real-Time Factor): ~1.9x with reference caching
  • Tested on RTX 3090
  • Quality comparable to original FP16/BF16 model

Usage

from voice_clone_tts import VoiceCloneTTS

tts = VoiceCloneTTS(
    llama_checkpoint_path="ORI-Muchim/openaudio-s1-mini-int8",
    decoder_checkpoint_path="ORI-Muchim/openaudio-s1-mini-int8",
)

audio, sr = tts.synthesize(
    text="Hello, this is a test.",
    reference_audio="reference.wav",  # Optional: for voice cloning
)

Files

  • model.pth - INT8 quantized LLaMA model (1.02 GB)
  • codec_int8.pth - INT8 quantized DAC codec (0.91 GB)
  • config.json - Model configuration
  • tokenizer.tiktoken - Tokenizer
  • special_tokens.json - Special tokens

Quantization Method

Weight-only INT8 quantization with per-channel scales:

  • Weights stored as INT8
  • Scales stored as BF16
  • Activations remain in FP16/BF16

Credits

License

CC-BY-NC-SA-4.0 (Non-commercial use only)

See the original model for full license terms.

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ORI-Muchim/openaudio-s1-mini-int8

Finetuned
(2)
this model