47z
/

glm-4-voice-decoder-emo-ft

speech-to-speech

Model card Files Files and versions

glm-4-voice-decoder-emo-ft / README.md

47z's picture

Upload README.md with huggingface_hub

191296e verified about 2 months ago

|

history blame contribute delete

933 Bytes

	---
	license: other
	license_name: glm-4-voice
	license_link: https://github.com/THUDM/GLM-4-Voice/blob/main/MODEL_LICENSE
	tags:
	- speech-to-speech
	- audio
	- emotion
	- kimi-audio
	- glm-4-voice
	---

	# glm-4-voice-decoder-emo-ft

	> Built with glm-4.

	Fine-tuned [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice) decoder
	weights for **emotion-preserving Chinese ↔ English speech-to-speech
	translation**, used together with the
	[Kimi-Audio Emotion-Aware S2ST](https://github.com/<YOUR_GH_USER>/kimi-audio-release)
	training / inference pipeline.

	## Files

	\| File \| Size \| Role \|
	\|---\|---\|---\|
	\| `epoch500_emoft.pt` \| ~425 MB \| Fine-tuned flow checkpoint (emotion-preserving) \|
	\| `hift.pt` \| ~79 MB \| HiFT vocoder checkpoint \|

	## Usage

	```bash
	git clone https://github.com/<YOUR_GH_USER>/kimi-audio-release
	cd kimi-audio-release
	./scripts/download_weights.sh
	# the two files will be placed under glm_4_voice_decoder/

	'EOF'