Instructions to use 47z/glm-4-voice-decoder-emo-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- KimiAudio
How to use 47z/glm-4-voice-decoder-emo-ft with KimiAudio:
# Example usage for KimiAudio # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git from kimia_infer.api.kimia import KimiAudio model = KimiAudio(model_path="47z/glm-4-voice-decoder-emo-ft", load_detokenizer=True) sampling_params = { "audio_temperature": 0.8, "audio_top_k": 10, "text_temperature": 0.0, "text_top_k": 5, } # For ASR asr_audio = "asr_example.wav" messages_asr = [ {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"}, {"role": "user", "message_type": "audio", "content": asr_audio} ] _, text = model.generate(messages_asr, **sampling_params, output_type="text") print(text) # For Q&A qa_audio = "qa_example.wav" messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}] wav, text = model.generate(messages_conv, **sampling_params, output_type="both") sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000) print(text) - Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: glm-4-voice | |
| license_link: https://github.com/THUDM/GLM-4-Voice/blob/main/MODEL_LICENSE | |
| tags: | |
| - speech-to-speech | |
| - audio | |
| - emotion | |
| - kimi-audio | |
| - glm-4-voice | |
| # glm-4-voice-decoder-emo-ft | |
| > Built with glm-4. | |
| Fine-tuned [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice) decoder | |
| weights for **emotion-preserving Chinese ↔ English speech-to-speech | |
| translation**, used together with the | |
| [Kimi-Audio Emotion-Aware S2ST](https://github.com/<YOUR_GH_USER>/kimi-audio-release) | |
| training / inference pipeline. | |
| ## Files | |
| | File | Size | Role | | |
| |---|---|---| | |
| | `epoch500_emoft.pt` | ~425 MB | Fine-tuned flow checkpoint (emotion-preserving) | | |
| | `hift.pt` | ~79 MB | HiFT vocoder checkpoint | | |
| ## Usage | |
| ```bash | |
| git clone https://github.com/<YOUR_GH_USER>/kimi-audio-release | |
| cd kimi-audio-release | |
| ./scripts/download_weights.sh | |
| # the two files will be placed under glm_4_voice_decoder/ | |
| 'EOF' | |