47z's picture
Upload README.md with huggingface_hub
191296e verified
---
license: other
license_name: glm-4-voice
license_link: https://github.com/THUDM/GLM-4-Voice/blob/main/MODEL_LICENSE
tags:
- speech-to-speech
- audio
- emotion
- kimi-audio
- glm-4-voice
---
# glm-4-voice-decoder-emo-ft
> Built with glm-4.
Fine-tuned [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice) decoder
weights for **emotion-preserving Chinese ↔ English speech-to-speech
translation**, used together with the
[Kimi-Audio Emotion-Aware S2ST](https://github.com/<YOUR_GH_USER>/kimi-audio-release)
training / inference pipeline.
## Files
| File | Size | Role |
|---|---|---|
| `epoch500_emoft.pt` | ~425 MB | Fine-tuned flow checkpoint (emotion-preserving) |
| `hift.pt` | ~79 MB | HiFT vocoder checkpoint |
## Usage
```bash
git clone https://github.com/<YOUR_GH_USER>/kimi-audio-release
cd kimi-audio-release
./scripts/download_weights.sh
# the two files will be placed under glm_4_voice_decoder/
'EOF'