Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jun 9, 2025

Commit

a5d5551

1 Parent(s): 3cbe74d

Update README file.

Browse files

Files changed (1) hide show

README.md +61 -7

README.md CHANGED Viewed

@@ -138,9 +138,9 @@ WHISPER_COREML=1 CMAKE_ARGS="-DGGML_METAL=on" pip install -r requirements.txt
 brew install ffmpeg
 ```
-5. **手动安装额外依赖**
 ```bash
-# 安装 kokoro-onnx 和指定版本的 numpy
 pip install kokoro-onnx
 # 重新安装指定版本的 numpy
@@ -173,7 +173,8 @@ python src/VoiceDialogue/main.py --help
 ```
 **首次运行说明**：
-- 看到 "服务启动成功" 提示后即可开始说话。
 #### 2. API 服务模式
@@ -209,8 +210,8 @@ python src/VoiceDialogue/main.py --mode api --port 9000 --reload
 | `--reload`| | 无 | `False` | (API模式) 启用热重载 |
 **支持的说话人角色**（动态加载）:
-- 中文角色：`罗翔`, `马保国`, `沈逸`, `杨幂`, `周杰伦`, `马云`
-- 英文角色：`Heart`, `Bella`, `Nicole`
 ### 高级配置
@@ -268,8 +269,13 @@ VoiceDialogue/
 ├── third_party/                 # 第三方库
 │   └── moyoyo_tts/             # GPT-SoVITs TTS 引擎
 ├── tests/                       # 测试文件
-├── docs/                        # 文档目录 (空)
 ├── assets/                      # 资源文件
 ├── requirements.txt             # Python 依赖
 └── README.md                    # 项目说明文档
 ```
@@ -296,6 +302,17 @@ VoiceDialogue/
 | **AudioStreamPlayer** | 音频流播放 | 实时音频输出播放 |
 | **FastAPI App** | API服务 | 提供HTTP接口，封装核心服务 |
 ## 🛠️ 故障排除
 ### 1. 模型下载失败
@@ -322,15 +339,52 @@ pip install -U huggingface_hub
 - **问题**: 包版本冲突或导入错误。
 - **解决方案**: 强烈建议在虚拟环境中安装。如果遇到问题，尝试重建虚拟环境。
 ```bash
 conda deactivate
 conda env remove -n voicedialogue
-# ... 重新执行安装步骤 ...
 ```
 ### 5. 说话人角色不存在
 - **问题**: 指定的说话人不在支持列表中。
 - **解决方案**: 使用 `python src/VoiceDialogue/main.py --help` 查看所有可用的说话人角色。
 ## 📄 许可证
 本项目采用 MIT 许可证开源。

 brew install ffmpeg
 ```
+5. **安装额外依赖**
 ```bash
+# 安装 kokoro-onnx
 pip install kokoro-onnx
 # 重新安装指定版本的 numpy
 ```
 **首次运行说明**：
+- 看到 "服务启动成功" 提示后即可开始说话
+- 系统会自动检测语音活动并进行识别和回复
 #### 2. API 服务模式
 | `--reload`| | 无 | `False` | (API模式) 启用热重载 |
 **支持的说话人角色**（动态加载）:
+- **中文角色**：`罗翔`, `马保国`, `沈逸`, `杨幂`, `周杰伦`, `马云`
+- **英文角色**：`Heart`, `Bella`, `Nicole`
 ### 高级配置
 ├── third_party/                 # 第三方库
 │   └── moyoyo_tts/             # GPT-SoVITs TTS 引擎
 ├── tests/                       # 测试文件
 ├── assets/                      # 资源文件
+│   ├── models/                  # 模型文件存储
+│   ├── audio/                   # 音频资源
+│   ├── libraries/               # 库文件
+│   └── www/                     # Web资源
+├── main.py                      # 项目启动入口（简化版）
+├── pyproject.toml               # 项目配置文件
 ├── requirements.txt             # Python 依赖
 └── README.md                    # 项目说明文档
 ```
 | **AudioStreamPlayer** | 音频流播放 | 实时音频输出播放 |
 | **FastAPI App** | API服务 | 提供HTTP接口，封装核心服务 |
+### 多线程架构
+系统采用多线程设计，各组件通过队列进行通信：
+- **音频采集线程**: 持续捕获音频数据
+- **语音监测线程**: 检测用户语音活动
+- **ASR线程**: 语音识别处理
+- **LLM线程**: 文本生成处理
+- **TTS线程**: 语音合成处理
+- **音频播放线程**: 音频输出播放
 ## 🛠️ 故障排除
 ### 1. 模型下载失败
 - **问题**: 包版本冲突或导入错误。
 - **解决方案**: 强烈建议在虚拟环境中安装。如果遇到问题，尝试重建虚拟环境。
 ```bash
+# 使用 conda
 conda deactivate
 conda env remove -n voicedialogue
+# 使用 uv
+rm -rf .venv
+uv venv
 ```
 ### 5. 说话人角色不存在
 - **问题**: 指定的说话人不在支持列表中。
 - **解决方案**: 使用 `python src/VoiceDialogue/main.py --help` 查看所有可用的说话人角色。
+### 6. FFmpeg 相关错误
+- **问题**: 音频处理失败或编解码错误。
+- **解决方案**: 确保正确安装 FFmpeg：
+```bash
+# 检查 FFmpeg 安装
+ffmpeg -version
+# 重新安装 FFmpeg
+# macOS
+brew reinstall ffmpeg
+```
+### 7. Python 版本兼容性
+- **问题**: Python 版本过低导致的兼容性问题。
+- **解决方案**: 确保使用 Python 3.11+ 版本：
+```bash
+python --version
+# 如果版本过低，请升级或使用虚拟环境
+```
+## 📊 性能优化建议
+### 硬件优化
+- **内存**: 推荐 32GB RAM 以获得最佳性能
+- **存储**: 使用 SSD 硬盘可显著提升模型加载速度
+- **CPU**: 多核处理器有助于多线程处理
+### 软件优化
+- **模型选择**: 根据硬件配置选择合适大小的模型
+- **批处理优化**: 调整 LLM 的 `n_batch` 参数
+- **音频缓冲**: 根据延迟要求调整音频缓冲区大小
 ## 📄 许可证
 本项目采用 MIT 许可证开源。