OpenAudio S1-mini INT8 Quantized
INT8 weight-only quantized version of fishaudio/openaudio-s1-mini for efficient GPU inference.
Model Size Comparison
| Model | Original | INT8 | Reduction |
|---|---|---|---|
| LLaMA (model.pth) | 1.64 GB | 1.02 GB | -38% |
| Codec (codec_int8.pth) | 1.74 GB | 0.91 GB | -48% |
| Total | 3.38 GB | 1.93 GB | -43% |
Performance
- RTF (Real-Time Factor): ~1.9x with reference caching
- Tested on RTX 3090
- Quality comparable to original FP16/BF16 model
Usage
from voice_clone_tts import VoiceCloneTTS
tts = VoiceCloneTTS(
llama_checkpoint_path="ORI-Muchim/openaudio-s1-mini-int8",
decoder_checkpoint_path="ORI-Muchim/openaudio-s1-mini-int8",
)
audio, sr = tts.synthesize(
text="Hello, this is a test.",
reference_audio="reference.wav", # Optional: for voice cloning
)
Files
model.pth- INT8 quantized LLaMA model (1.02 GB)codec_int8.pth- INT8 quantized DAC codec (0.91 GB)config.json- Model configurationtokenizer.tiktoken- Tokenizerspecial_tokens.json- Special tokens
Quantization Method
Weight-only INT8 quantization with per-channel scales:
- Weights stored as INT8
- Scales stored as BF16
- Activations remain in FP16/BF16
Credits
- Original model: Fish Audio / fishaudio/openaudio-s1-mini
- Quantization: ORI-Muchim
License
CC-BY-NC-SA-4.0 (Non-commercial use only)
See the original model for full license terms.
- Downloads last month
- 29
Model tree for ORI-Muchim/openaudio-s1-mini-int8
Base model
fishaudio/openaudio-s1-mini