Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "VoiceDialogue"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jul 3, 2025

Commit

2ecfa8f

1 Parent(s): d846f85

Add echo cancellation and VAD toggle support in service factories and routes

Browse files

- Extend `create_audio_capture` and `create_speech_monitor` methods to include `enable_echo_cancellation` and `enable_vad` options.
- Update service definitions and health checks to pass configuration dynamically.
- Add `SystemStartRequest` schema to support echo cancellation toggle during system startup.
- Adjust `_start_system_background` logic to initialize services based on the new toggles.

Files changed (3) hide show

src/voice_dialogue/api/core/service_factories.py +23 -11
src/voice_dialogue/api/routes/system_routes.py +42 -5
src/voice_dialogue/api/schemas/system_schemas.py +5 -0

src/voice_dialogue/api/core/service_factories.py CHANGED Viewed

@@ -13,18 +13,20 @@ class ServiceFactories:
     """服务工厂类，封装所有服务的创建逻辑"""
     @staticmethod
-    def create_audio_capture() -> AudioCapture:
         """创建音频捕获服务"""
         return AudioCapture(
-            audio_frames_queue=audio_frames_queue
         )
     @staticmethod
-    def create_speech_monitor() -> SpeechStateMonitor:
         """创建语音监控服务"""
         return SpeechStateMonitor(
             audio_frame_queue=audio_frames_queue,
             user_voice_queue=user_voice_queue,
         )
     @staticmethod
@@ -90,12 +92,12 @@ def get_core_voice_service_definitions(system_language: str, tts_config: BaseTTS
         # ),
         # 语音状态监控服务
-        ServiceDefinition(
-            name="speech_monitor",
-            factory=ServiceFactories.create_speech_monitor,
-            dependencies=[],
-            health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
-        ),
         # ASR语音识别服务
         ServiceDefinition(
@@ -129,11 +131,21 @@ def get_core_voice_service_definitions(system_language: str, tts_config: BaseTTS
     ]
-def get_audio_capture_service_definition() -> ServiceDefinition:
     """获取音频捕获服务定义"""
     return ServiceDefinition(
         name="audio_capture",
-        factory=ServiceFactories.create_audio_capture,
         dependencies=[],
         health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
     )

     """服务工厂类，封装所有服务的创建逻辑"""
     @staticmethod
+    def create_audio_capture(enable_echo_cancellation: bool = True) -> AudioCapture:
         """创建音频捕获服务"""
         return AudioCapture(
+            audio_frames_queue=audio_frames_queue,
+            enable_echo_cancellation=enable_echo_cancellation
         )
     @staticmethod
+    def create_speech_monitor(enable_vad: bool = False) -> SpeechStateMonitor:
         """创建语音监控服务"""
         return SpeechStateMonitor(
             audio_frame_queue=audio_frames_queue,
             user_voice_queue=user_voice_queue,
+            enable_vad=enable_vad
         )
     @staticmethod
         # ),
         # 语音状态监控服务
+        # ServiceDefinition(
+        #     name="speech_monitor",
+        #     factory=ServiceFactories.create_speech_monitor,
+        #     dependencies=[],
+        #     health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
+        # ),
         # ASR语音识别服务
         ServiceDefinition(
     ]
+def get_audio_capture_service_definition(enable_echo_cancellation: bool = True) -> ServiceDefinition:
     """获取音频捕获服务定义"""
     return ServiceDefinition(
         name="audio_capture",
+        factory=lambda: ServiceFactories.create_audio_capture(enable_echo_cancellation),
+        dependencies=[],
+        health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
+    )
+def get_speech_monitor_service_definition(enable_vad: bool = False) -> ServiceDefinition:
+    """获取语音监控服务定义"""
+    return ServiceDefinition(
+        name="speech_monitor",
+        factory=lambda: ServiceFactories.create_speech_monitor(enable_vad),
         dependencies=[],
         health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
     )

src/voice_dialogue/api/routes/system_routes.py CHANGED Viewed

@@ -5,9 +5,9 @@ from fastapi import APIRouter, HTTPException, BackgroundTasks, Request
 from voice_dialogue.core.constants import session_manager
 from voice_dialogue.utils.logger import logger
-from ..core.service_factories import get_audio_capture_service_definition
 from ..schemas.system_schemas import (
-    SystemStatusResponse, SystemResponse
 )
 router = APIRouter()
@@ -62,6 +62,7 @@ async def get_system_status(request: Request):
 @router.post("/start", response_model=SystemResponse, summary="启动系统")
 async def start_system(
         fastapi_request: Request,
         background_tasks: BackgroundTasks
 ):
@@ -82,7 +83,8 @@ async def start_system(
         # 在后台启动系统
         background_tasks.add_task(
             _start_system_background,
-            fastapi_request
         )
         return SystemResponse(
@@ -135,6 +137,28 @@ async def stop_system(request: Request):
                     except Exception as e:
                         logger.error(f"停止音频捕获服务时发生错误: {e}", exc_info=True)
             # 停止audio_player服务
             if service_manager.is_service_running("audio_player"):
                 audio_player_service = service_manager.get_service("audio_player")
@@ -190,7 +214,7 @@ async def restart_system(
         raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
-async def _start_system_background(request: Request):
     """
     后台启动系统的实际逻辑 - 创建并启动audio_capture服务
     """
@@ -229,12 +253,25 @@ async def _start_system_background(request: Request):
         else:
             logger.warning("未找到音频播放服务，系统将继续启动但可能无法播放音频")
         # 检查audio_capture服务是否已存在
         if service_manager.is_service_running("audio_capture"):
             logger.info("音频捕获服务已在运行")
         else:
             # 创建audio_capture服务定义
-            audio_capture_def = get_audio_capture_service_definition()
             # 启动audio_capture服务
             success = service_manager.start_service(audio_capture_def)

 from voice_dialogue.core.constants import session_manager
 from voice_dialogue.utils.logger import logger
+from ..core.service_factories import get_audio_capture_service_definition, get_speech_monitor_service_definition
 from ..schemas.system_schemas import (
+    SystemStatusResponse, SystemResponse, SystemStartRequest
 )
 router = APIRouter()
 @router.post("/start", response_model=SystemResponse, summary="启动系统")
 async def start_system(
+        request: SystemStartRequest,
         fastapi_request: Request,
         background_tasks: BackgroundTasks
 ):
         # 在后台启动系统
         background_tasks.add_task(
             _start_system_background,
+            fastapi_request,
+            request.enable_echo_cancellation
         )
         return SystemResponse(
                     except Exception as e:
                         logger.error(f"停止音频捕获服务时发生错误: {e}", exc_info=True)
+            # 停止语音监控服务
+            if service_manager.is_service_running("speech_monitor"):
+                speech_monitor_service = service_manager.get_service("speech_monitor")
+                if speech_monitor_service:
+                    try:
+                        speech_monitor_service.exit()
+                        logger.info("语音监控服务已停止")
+                        # 等待服务停止
+                        timeout = 5
+                        start_time = time.time()
+                        while speech_monitor_service.is_alive() and (time.time() - start_time) < timeout:
+                            await asyncio.sleep(0.1)
+                        # 从服务管理器中移除
+                        if "speech_monitor" in service_manager.services:
+                            del service_manager.services["speech_monitor"]
+                    except Exception as e:
+                        logger.error(f"停止语音监控服务时发生错误: {e}", exc_info=True)
             # 停止audio_player服务
             if service_manager.is_service_running("audio_player"):
                 audio_player_service = service_manager.get_service("audio_player")
         raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
+async def _start_system_background(request: Request, enable_echo_cancellation: bool = True):
     """
     后台启动系统的实际逻辑 - 创建并启动audio_capture服务
     """
         else:
             logger.warning("未找到音频播放服务，系统将继续启动但可能无法播放音频")
+        if service_manager.is_service_running("speech_monitor"):
+            logger.info("语音监控服务已在运行")
+        else:
+            # 创建语音监控服务定义
+            enable_vad = not enable_echo_cancellation
+            speech_monitor_def = get_speech_monitor_service_definition(enable_vad)
+            # 启动语音监控服务
+            success = service_manager.start_service(speech_monitor_def)
+            if not success:
+                raise RuntimeError("语音监控服务启动失败")
+            logger.info("语音监控服务启动成功")
         # 检查audio_capture服务是否已存在
         if service_manager.is_service_running("audio_capture"):
             logger.info("音频捕获服务已在运行")
         else:
             # 创建audio_capture服务定义
+            audio_capture_def = get_audio_capture_service_definition(enable_echo_cancellation)
             # 启动audio_capture服务
             success = service_manager.start_service(audio_capture_def)

src/voice_dialogue/api/schemas/system_schemas.py CHANGED Viewed

@@ -15,6 +15,11 @@ class SystemStatusResponse(BaseModel):
     services_details: Optional[Dict[str, Any]] = Field(None, description="服务详细状态信息")
 class SystemResponse(BaseModel):
     """系统操作响应"""
     success: bool = Field(..., description="操作是否成功")

     services_details: Optional[Dict[str, Any]] = Field(None, description="服务详细状态信息")
+class SystemStartRequest(BaseModel):
+    """系统启动请求"""
+    enable_echo_cancellation: bool = Field(default=True, description="是否启用回声消除")
 class SystemResponse(BaseModel):
     """系统操作响应"""
     success: bool = Field(..., description="操作是否成功")