TurnGuide Checkpoint: turnguide_loss_2_1

This repository hosts the TurnGuide fine-tuned checkpoint used by the inference code in the TurnGuide GitHub repository.

This checkpoint was trained with a text:speech token loss ratio of 2:1.

TurnGuide is introduced in:

TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving

🎉 TurnGuide has been accepted to Interspeech 2026 Long Paper Track!

What This Checkpoint Is

turnguide_loss_2_1 is a GLM-4-Voice-based checkpoint for TurnGuide inference. It is one of two released TurnGuide checkpoints:

Both checkpoints can be used with the same TurnGuide inference script by changing --model-path.

This checkpoint is intended to be used with:

The decoder is not included in this checkpoint repository and should be downloaded separately.

Installation

Clone the TurnGuide code repository:

git clone https://github.com/dreamtheater123/TurnGuide.git
cd TurnGuide

Create the tested environment:

conda env create -f environment.yml
conda activate turnguide

The tested core environment uses:

Python 3.10.16
PyTorch 2.5.0
CUDA 12.1
torchaudio 2.5.0
transformers 4.44.1

Download the GLM-4-Voice decoder:

git clone https://huggingface.co/zai-org/glm-4-voice-decoder

Inference

Run TurnGuide inference from the TurnGuide repository:

python turnguide_inference.py \
  --input-audio path/to/input.wav \
  --model-path qqjz/turnguide_loss_2_1 \
  --tokenizer-path zai-org/glm-4-voice-tokenizer \
  --flow-path ./glm-4-voice-decoder \
  --output-dir ./turnguide_demo_output

To use the 3:1 checkpoint instead, replace --model-path qqjz/turnguide_loss_2_1 with --model-path qqjz/turnguide_loss_3_1.

The script writes:

  • assistant.wav: generated assistant-channel speech
  • stereo_user_left_assistant_right.wav: stereo audio with user speech on the left channel and assistant speech on the right channel
  • a JSON file containing interleaved decoded text information

Notes

  • This checkpoint uses custom GLM-4-Voice code and should be loaded with trust_remote_code=True.
  • The checkpoint is designed for research use with the TurnGuide inference pipeline.
  • Model weights from GLM-4-Voice and related assets are governed by their respective licenses. Please follow the license terms of the original GLM-4-Voice models and decoder.

Citation

@article{turnguide2026,
  title={TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving},
  author={Cui, Wenqian and Zhu, Lei and Li, Xiao-Hui and Guo, Zhihan and Bai, Haoli and Hou, Lu and King, Irwin},
  journal={arXiv preprint arXiv:2508.07375},
  year={2026}
}
Downloads last month
34
Safetensors
Model size
10B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qqjz/turnguide_loss_2_1

Finetuned
(3)
this model

Paper for qqjz/turnguide_loss_2_1