Instructions to use qqjz/turnguide_loss_2_1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use qqjz/turnguide_loss_2_1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="qqjz/turnguide_loss_2_1", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("qqjz/turnguide_loss_2_1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
TurnGuide Checkpoint: turnguide_loss_2_1
This repository hosts the TurnGuide fine-tuned checkpoint used by the inference code in the TurnGuide GitHub repository.
This checkpoint was trained with a text:speech token loss ratio of 2:1.
TurnGuide is introduced in:
🎉 TurnGuide has been accepted to Interspeech 2026 Long Paper Track!
What This Checkpoint Is
turnguide_loss_2_1 is a GLM-4-Voice-based checkpoint for TurnGuide inference. It is one of two released TurnGuide checkpoints:
- qqjz/turnguide_loss_2_1: text:speech token loss ratio = 2:1
- qqjz/turnguide_loss_3_1: text:speech token loss ratio = 3:1
Both checkpoints can be used with the same TurnGuide inference script by changing --model-path.
This checkpoint is intended to be used with:
- TurnGuide code: dreamtheater123/TurnGuide
- GLM-4-Voice speech tokenizer: zai-org/glm-4-voice-tokenizer
- GLM-4-Voice decoder: zai-org/glm-4-voice-decoder
The decoder is not included in this checkpoint repository and should be downloaded separately.
Installation
Clone the TurnGuide code repository:
git clone https://github.com/dreamtheater123/TurnGuide.git
cd TurnGuide
Create the tested environment:
conda env create -f environment.yml
conda activate turnguide
The tested core environment uses:
Python 3.10.16
PyTorch 2.5.0
CUDA 12.1
torchaudio 2.5.0
transformers 4.44.1
Download the GLM-4-Voice decoder:
git clone https://huggingface.co/zai-org/glm-4-voice-decoder
Inference
Run TurnGuide inference from the TurnGuide repository:
python turnguide_inference.py \
--input-audio path/to/input.wav \
--model-path qqjz/turnguide_loss_2_1 \
--tokenizer-path zai-org/glm-4-voice-tokenizer \
--flow-path ./glm-4-voice-decoder \
--output-dir ./turnguide_demo_output
To use the 3:1 checkpoint instead, replace --model-path qqjz/turnguide_loss_2_1 with --model-path qqjz/turnguide_loss_3_1.
The script writes:
assistant.wav: generated assistant-channel speechstereo_user_left_assistant_right.wav: stereo audio with user speech on the left channel and assistant speech on the right channel- a JSON file containing interleaved decoded text information
Notes
- This checkpoint uses custom GLM-4-Voice code and should be loaded with
trust_remote_code=True. - The checkpoint is designed for research use with the TurnGuide inference pipeline.
- Model weights from GLM-4-Voice and related assets are governed by their respective licenses. Please follow the license terms of the original GLM-4-Voice models and decoder.
Citation
@article{turnguide2026,
title={TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving},
author={Cui, Wenqian and Zhu, Lei and Li, Xiao-Hui and Guo, Zhihan and Bai, Haoli and Hou, Lu and King, Irwin},
journal={arXiv preprint arXiv:2508.07375},
year={2026}
}
- Downloads last month
- 34
Model tree for qqjz/turnguide_loss_2_1
Base model
zai-org/glm-4-voice-9b