🎙️🤖 Goodspace Voice Agent: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Powered by advanced speech-language models and streaming synthesis technology

Goodspace Voice Agent is a cutting-edge series of speech-language models built on the Qwen2.5-0.5B/1.5B/3B/7B/14B/32B-Instruct models. It can generate both text and speech responses simultaneously, enabling high-quality and low-latency speech interaction. With the streaming autoregressive speech decoder, Goodspace Voice Agent achieves exceptional speech quality and natural conversation flow.

🔥 News

Goodspace Voice Agent - Advanced real-time voice interaction system now available!

Install

Clone this repository.

git clone https://github.com/goodspace/voice-agent
cd voice-agent

Install packages.

conda create -n goodspace-voice python=3.10
conda activate goodspace-voice
pip install -e .

Quick Start

Download the Whisper-large-v3 model.

import whisper
model = whisper.load_model("large-v3", download_root="models/speech_encoder/")

Download the flow-matching model and vocoder of CosyVoice 2.

huggingface-cli download --resume-download goodspace/cosy2_decoder --local-dir models/cosy2_decoder

If you’re experiencing unstable connections to Hugging Face from within China, you can try setting the following in your command line:
export HF_ENDPOINT=https://hf-mirror.com

Download the Goodspace Voice Agent models from Hugging Face. GoodspaceVoice-0.5B/1.5B/3B/7B/14B support English only, while GoodspaceVoice-0.5B/1.5B/3B/7B/14B/32B-Bilingual support both English and Chinese.

model_name=GoodspaceVoice-7B-Bilingual
huggingface-cli download --resume-download goodspace/$model_name --local-dir models/$model_name

Gradio Demo

Launch a controller.

python -m goodspace_voice.serve.controller --host 0.0.0.0 --port 10000

Launch a gradio web server.

python -m goodspace_voice.serve.gradio_web_server --controller http://localhost:10000 --port 8000 --vocoder-dir models/cosy2_decoder

Launch a model worker.

python -m goodspace_voice.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path models/$model_name --model-name $model_name

Visit http://localhost:8000/ and interact with GoodspaceVoice!

Local Inference

output_dir=examples/$model_name
mkdir -p $output_dir

python goodspace_voice/inference/run_goodspace_voice.py \
    --model_path models/$model_name \
    --question_file examples/questions.json \
    --answer_file $output_dir/answers.jsonl \
    --temperature 0 \
    --s2s

python goodspace_voice/inference/run_cosy2_decoder.py \
    --input-path $output_dir/answers.jsonl \
    --output-dir $output_dir/wav \
    --lang en

LICENSE

The Goodspace Voice Agent is released under the Apache-2.0 License.

Commercial Use

For commercial use inquiries or licensing information, please contact the Goodspace team.

Acknowledgements

CosyVoice 2: We use the pretrained speech tokenizer, flow-matching model and vocoder of CosyVoice 2.
SLAM-LLM: We borrow some code about speech encoder and speech adaptor.
Based on the research work from LLaMA-Omni2 paper.

Support

If you have any questions or issues, please feel free to submit an issue on our GitHub repository.

Contributing

We welcome contributions! Please see our contributing guidelines for more information.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support