Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jul 7, 2025

Commit

2ebe57f

1 Parent(s): d74bcbf

新增Mixin类以增强语音任务处理功能

Browse files

- 添加`TaskStatusMixin`、`HistoryMixin`和`PerformanceLogMixin`类，提供语音任务状态检查、聊天历史更新和性能日志记录功能。
- 在`TTSAudioGenerator`和`AudioStreamPlayer`类中集成新混入类，优化任务处理逻辑。
- 更新`utils.py`，新增音频帧归一化和时长计算功能。

Files changed (7) hide show

src/voice_dialogue/services/audio/generator.py +35 -59
src/voice_dialogue/services/audio/player.py +62 -101
src/voice_dialogue/services/mixins.py +91 -0
src/voice_dialogue/services/speech/monitor.py +5 -4
src/voice_dialogue/services/speech/recognizer.py +1 -1
src/voice_dialogue/services/text/generator.py +1 -1
src/voice_dialogue/services/utils.py +37 -0

src/voice_dialogue/services/audio/generator.py CHANGED Viewed

@@ -1,17 +1,17 @@
-import re
 import time
 from multiprocessing import Queue
 from queue import Empty
 from voice_dialogue.core.base import BaseThread
-from voice_dialogue.core.constants import dropped_audio_cache, user_still_speaking_event, voice_state_manager, \
-    session_manager, is_debug_mode
 from voice_dialogue.models.voice_task import VoiceTask
 from voice_dialogue.utils.logger import logger
 from .generators import tts_manager, BaseTTSConfig
-class TTSAudioGenerator(BaseThread):
     """
     TTS 音频生成器 - 负责将文本转换为音频
@@ -62,69 +62,45 @@ class TTSAudioGenerator(BaseThread):
         while not self.is_exited:
             try:
-                voice_task: VoiceTask = self.text_input_queue.get(block=False, timeout=1)
-            except Empty:
-                continue
-            if not voice_task.answer_sentence:
-                continue
-            answer_id = voice_task.answer_id
-            if user_still_speaking_event.is_set():
-                voice_state_manager.drop_audio_task(voice_task.id)
-                dropped_audio_cache[answer_id] = answer_id
-                user_still_speaking_event.clear()
-                continue
-            if answer_id in dropped_audio_cache:
-                continue
-            if self.is_task_interrupted(voice_task):
-                continue
-            if voice_task.session_id != session_manager.current_id:
-                continue
-            if self.has_no_words(voice_task.answer_sentence):
-                logger.info(f"跳过仅包含标点的文本: '{voice_task.answer_sentence}'")
                 continue
-            if is_debug_mode():
-                logger.info(f"TTS 音频生成: {voice_task.answer_sentence}")
-            voice_task.tts_start_time = time.time()
-            try:
-                tts_generated_sentence_audio = self.tts_instance.synthesize(voice_task.answer_sentence)
-            except Exception as e:
-                logger.error(f"TTS 音频生成失败: {e}")
-                voice_state_manager.reset_task_id()
-                continue
-            voice_task.tts_generated_sentence_audio = tts_generated_sentence_audio
-            voice_task.tts_end_time = time.time()
-            # print(f'生成音频：{voice_task.answer_sentence}')
-            self.audio_output_queue.put(voice_task)
-    def is_task_interrupted(self, voice_task: VoiceTask) -> bool:
-        """
-        检查语音任务是否被中断
-        Args:
-            voice_task: 当前处理的语音任务
-        Returns:
-            bool: 如果任务被中断返回True，否则返回False
-        """
-        return (voice_state_manager.interrupt_task_id and
-                voice_task.id != voice_state_manager.interrupt_task_id)
-    def has_no_words(self, text: str) -> bool:
-        """
-        检查文本是否不包含任何单词（字母、数字或中文字符）。
-        如果文本只包含标点、空格等符号，则返回 True。
-        """
-        # 搜索任何字母、数字或中文字符
-        if re.search(r'[\u4e00-\u9fa5a-zA-Z0-9]', text):
-            return False
-        return True

 import time
 from multiprocessing import Queue
 from queue import Empty
 from voice_dialogue.core.base import BaseThread
+from voice_dialogue.core.constants import voice_state_manager, is_debug_mode
 from voice_dialogue.models.voice_task import VoiceTask
+from voice_dialogue.services.mixins import TaskStatusMixin
+from voice_dialogue.services.utils import has_no_words
 from voice_dialogue.utils.logger import logger
 from .generators import tts_manager, BaseTTSConfig
+class TTSAudioGenerator(BaseThread, TaskStatusMixin):
     """
     TTS 音频生成器 - 负责将文本转换为音频
         while not self.is_exited:
             try:
+                voice_task: VoiceTask = self.text_input_queue.get(block=True, timeout=1)
+                if not voice_task:
+                    continue
+                self._process_task(voice_task)
+            except Empty:
                 continue
+            except Exception as e:
+                logger.error(f"TTSAudioGenerator 主循环错误: {e}")
+                time.sleep(0.1)
+    def _process_task(self, voice_task: VoiceTask):
+        """处理单个文本到语音任务"""
+        if not voice_task.answer_sentence:
+            return
+        if self.handle_user_speaking_interruption(voice_task):
+            return
+        if not self.is_task_valid(voice_task):
+            return
+        if has_no_words(voice_task.answer_sentence):
+            logger.info(f"跳过仅包含标点的文本: '{voice_task.answer_sentence}'")
+            return
+        if is_debug_mode():
+            logger.info(f"TTS 音频生成: {voice_task.answer_sentence}")
+        voice_task.tts_start_time = time.time()
+        try:
+            tts_generated_sentence_audio = self.tts_instance.synthesize(voice_task.answer_sentence)
+        except Exception as e:
+            logger.error(f"TTS 音频生成失败: {e}")
+            voice_state_manager.reset_task_id()
+            return
+        voice_task.tts_generated_sentence_audio = tts_generated_sentence_audio
+        voice_task.tts_end_time = time.time()
+        self.audio_output_queue.put(voice_task)

src/voice_dialogue/services/audio/player.py CHANGED Viewed

@@ -1,21 +1,22 @@
 import tempfile
-from collections import OrderedDict
 from multiprocessing import Queue
 from queue import Empty
 import soundfile as sf
 from playsound import playsound
 from voice_dialogue.core.base import BaseThread
 from voice_dialogue.core.constants import (
-    user_still_speaking_event, voice_state_manager, dropped_audio_cache, chat_history_cache,
-    silence_over_threshold_event, session_manager, is_debug_mode
 )
 from voice_dialogue.models.voice_task import VoiceTask, AnswerDisplayMessage
 from voice_dialogue.utils.logger import logger
-class AudioStreamPlayer(BaseThread):
     """音频流播放器 - 负责播放生成的音频并管理播放状态"""
     def __init__(
@@ -27,116 +28,76 @@ class AudioStreamPlayer(BaseThread):
         self.audio_playing_queue: Queue = audio_playing_queue
         self.websocket_message_queue: Queue = websocket_message_queue
-    def run(self):
-        self.is_ready = True
         while not self.is_exited:
-            try:
-                voice_task: VoiceTask = self.audio_playing_queue.get(block=False, timeout=1)
-            except Empty:
                 continue
-            while True:
-                task_id = voice_task.id
-                answer_id = voice_task.answer_id
-                if user_still_speaking_event.is_set():
-                    logger.info('用户还有说话')
-                    voice_state_manager.drop_audio_task(task_id)
-                    dropped_audio_cache[answer_id] = answer_id
-                    user_still_speaking_event.clear()
-                    break
-                if self.is_task_interrupted(voice_task):
-                    break
-                if voice_task.session_id != session_manager.current_id:
-                    break
-                if answer_id in dropped_audio_cache:
-                    # print('Drop answer audio')
-                    break
-                if not silence_over_threshold_event.is_set():
-                    continue
-                if self.websocket_message_queue:
-                    self.websocket_message_queue.put_nowait(
-                        AnswerDisplayMessage(
-                            session_id=voice_task.session_id,
-                            task_id=task_id,
-                            answer_index=voice_task.answer_index,
-                            answer=voice_task.answer_sentence,
-                        )
                     )
-                if is_debug_mode():
-                    self._log_task_info(voice_task)
-                self.update_chat_history(voice_task)
-                voice_state_manager.set_audio_playing(task_id)
-                voice_state_manager.reset_task_id()
-                if not self.is_stopped:
-                    audio_data, sample_rate = voice_task.tts_generated_sentence_audio
-                    self.playing_audio(audio_data, sample_rate)
-                if self.audio_playing_queue.empty():
-                    logger.info(f'��答播放完了')
-                break
-    def is_task_interrupted(self, voice_task: VoiceTask) -> bool:
         """
-        检查语音任务是否被中断
-        Args:
-            voice_task: 当前处理的语音任务
-        Returns:
-            bool: 如果任务被中断返回True，否则返回False
-        """
-        return (voice_state_manager.interrupt_task_id and
-                voice_task.id != voice_state_manager.interrupt_task_id)
-    def _log_task_info(self, voice_task):
-        import librosa
-        asr_duration = voice_task.whisper_end_time - voice_task.whisper_start_time
-        llm_duration = voice_task.llm_end_time - voice_task.llm_start_time
-        tts_duration = voice_task.tts_end_time - voice_task.tts_start_time
-        audio_data, sample_rate = voice_task.tts_generated_sentence_audio
-        audio_duration = librosa.get_duration(y=audio_data, sr=sample_rate)
-        logger.info(
-            f"\n"
-            f"┌───────────────────────── 任务信息  ───────────────────────┐\n"
-            f"│ 任务ID: {voice_task.id}\n"
-            f"├───────────────────────── 性能统计 ────────────────────────┤\n"
-            f"│ ASR 耗时: {asr_duration:.2f}s\n"
-            f"│ LLM 耗时: {llm_duration:.2f}s\n"
-            f"│ TTS 耗时: {tts_duration:.2f}s\n"
-            f"│ 音频长度: {audio_duration:.2f}s\n"
-            f"├───────────────────────── 生成内容 ────────────────────────┤\n"
-            f"│-> {voice_task.answer_sentence}\n"
-            f"└──────────────────────────────────────────────────────────┘"
-        )
-    def update_chat_history(self, voice_task):
-        chat_history = chat_history_cache.get(voice_task.session_id, OrderedDict())
-        task_answer_id = voice_task.answer_id
-        user_question = f'{task_answer_id}:human'
-        chat_history[user_question] = voice_task.transcribed_text
-        ai_answer = f'{task_answer_id}:ai'
-        cached_ai_answer = chat_history.get(ai_answer, [])
-        cached_ai_answer.append(voice_task.answer_sentence)
-        chat_history[ai_answer] = cached_ai_answer
-        chat_history_cache[voice_task.session_id] = chat_history
-    def playing_audio(self, audio_data, sample_rate=16000):
-        with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
-            # print(f'================soundfile : {soundfile.name}')
-            sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
-            # print(soundfile.name)
-            playsound(soundfile.name, block=True)

 import tempfile
+import time
 from multiprocessing import Queue
 from queue import Empty
+from typing import Optional
 import soundfile as sf
 from playsound import playsound
 from voice_dialogue.core.base import BaseThread
 from voice_dialogue.core.constants import (
+    voice_state_manager, silence_over_threshold_event, is_debug_mode
 )
 from voice_dialogue.models.voice_task import VoiceTask, AnswerDisplayMessage
+from voice_dialogue.services.mixins import TaskStatusMixin, HistoryMixin, PerformanceLogMixin
 from voice_dialogue.utils.logger import logger
+class AudioStreamPlayer(BaseThread, TaskStatusMixin, HistoryMixin, PerformanceLogMixin):
     """音频流播放器 - 负责播放生成的音频并管理播放状态"""
     def __init__(
         self.audio_playing_queue: Queue = audio_playing_queue
         self.websocket_message_queue: Queue = websocket_message_queue
+    def _get_task_from_queue(self) -> Optional[VoiceTask]:
+        """从音频播放队列中获取任务。"""
+        # 使用阻塞式获取，当队列为空时，run循环中的Empty异常会处理它
+        return self.audio_playing_queue.get(block=True, timeout=1)
+    def _process_task(self, voice_task: VoiceTask):
+        """处理单个音频播放任务。"""
+        # 这个内部循环用于等待一个外部事件（用户静音），同时检查任务是否被中断
         while not self.is_exited:
+            if self.handle_user_speaking_interruption(voice_task):
+                return  # 任务被中断，结束处理
+            if not self.is_task_valid(voice_task):
+                return  # 任务无效，结束处理
+            # 等待用户彻底静音的信号
+            if not silence_over_threshold_event.is_set():
+                time.sleep(0.05)  # 短暂等待，避免CPU空转
                 continue
+            # --- 开始播放逻辑 ---
+            if self.websocket_message_queue:
+                self.websocket_message_queue.put_nowait(
+                    AnswerDisplayMessage(
+                        session_id=voice_task.session_id,
+                        task_id=voice_task.id,
+                        answer_index=voice_task.answer_index,
+                        answer=voice_task.answer_sentence,
                     )
+                )
+            if is_debug_mode():
+                self.log_task_performance(voice_task, "音频播放")
+            self.update_chat_history(voice_task)
+            voice_state_manager.set_audio_playing(voice_task.id)
+            voice_state_manager.reset_task_id()
+            if not self.is_stopped:
+                audio_data, sample_rate = voice_task.tts_generated_sentence_audio
+                self._play_audio(audio_data, sample_rate)
+            # 任务处理完毕，跳出内部循环
+            break
+    def _play_audio(self, audio_data, sample_rate=16000):
+        with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
+            sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
+            playsound(soundfile.name, block=True)
+    def run(self):
+        """
+        主运行循环。
+        不断从队列获取任务，并调用_process_task进行处理。
         """
+        if not hasattr(self, 'is_ready'):
+            logger.warning(f"{self.__class__.__name__} 中缺少 'is_ready' 属性。")
+        self.is_ready = True
+        while not self.is_exited:
+            try:
+                task = self._get_task_from_queue()
+                if task:
+                    self._process_task(task)
+            except Empty:
+                # 队列在1秒内没有新项目，这是正常现象，继续循环
+                continue
+            except Exception as e:
+                logger.error(f"在 AudioStreamPlayer 环节发生错误: {e}")
+                time.sleep(0.1)  # 发生未知错误时短暂休眠

src/voice_dialogue/services/mixins.py ADDED Viewed

	@@ -0,0 +1,91 @@

+from collections import OrderedDict
+from voice_dialogue.core.constants import (
+    voice_state_manager, session_manager, dropped_audio_cache,
+    user_still_speaking_event, chat_history_cache
+)
+from voice_dialogue.models.voice_task import VoiceTask
+from voice_dialogue.utils.logger import logger
+class TaskStatusMixin:
+    """提供语音任务状态检查和中断处理的通用功能"""
+    def is_task_interrupted(self, voice_task: VoiceTask) -> bool:
+        """检查语音任务是否被其他任务中断"""
+        return (voice_state_manager.interrupt_task_id and
+                voice_task.id != voice_state_manager.interrupt_task_id)
+    def is_task_valid(self, voice_task: VoiceTask) -> bool:
+        """检查语音任务是否有效（会话匹配、未被丢弃等）"""
+        if self.is_task_interrupted(voice_task):
+            return False
+        if voice_task.session_id != session_manager.current_id:
+            return False
+        if voice_task.answer_id in dropped_audio_cache:
+            return False
+        return True
+    def handle_user_speaking_interruption(self, voice_task: VoiceTask) -> bool:
+        """处理用户继续说话导致的中断"""
+        if user_still_speaking_event.is_set():
+            logger.info(f'用户仍在说话，丢弃任务 {voice_task.id}')
+            voice_state_manager.drop_audio_task(voice_task.id)
+            dropped_audio_cache[voice_task.answer_id] = voice_task.answer_id
+            user_still_speaking_event.clear()
+            return True
+        return False
+class HistoryMixin:
+    """提供更新聊天历史记录的功能"""
+    def update_chat_history(self, voice_task: VoiceTask) -> None:
+        """更新会话的聊天历史"""
+        chat_history = chat_history_cache.get(voice_task.session_id, OrderedDict())
+        task_answer_id = voice_task.answer_id
+        user_question_key = f'{task_answer_id}:human'
+        if user_question_key not in chat_history:
+            chat_history[user_question_key] = voice_task.transcribed_text
+        ai_answer_key = f'{task_answer_id}:ai'
+        cached_ai_answer = chat_history.get(ai_answer_key, [])
+        cached_ai_answer.append(voice_task.answer_sentence)
+        chat_history[ai_answer_key] = cached_ai_answer
+        chat_history_cache[voice_task.session_id] = chat_history
+class PerformanceLogMixin:
+    """提供记录任务性能日志的功能"""
+    def log_task_performance(self, voice_task: VoiceTask, task_name: str = "任务"):
+        """记录ASR, LLM, TTS各阶段耗时和音频长度"""
+        try:
+            from voice_dialogue.services.utils import calculate_audio_duration
+            asr_duration = getattr(voice_task, 'whisper_end_time', 0) - getattr(voice_task, 'whisper_start_time', 0)
+            llm_duration = getattr(voice_task, 'llm_end_time', 0) - getattr(voice_task, 'llm_start_time', 0)
+            tts_duration = getattr(voice_task, 'tts_end_time', 0) - getattr(voice_task, 'tts_start_time', 0)
+            audio_duration = 0
+            if hasattr(voice_task, 'tts_generated_sentence_audio') and voice_task.tts_generated_sentence_audio:
+                audio_data, sample_rate = voice_task.tts_generated_sentence_audio
+                audio_duration = calculate_audio_duration(audio_data, sample_rate)
+            logger.info(
+                f"\n"
+                f"┌───────────────────────── 任务信息  ───────────────────────┐\n"
+                f"│ 任务ID: {voice_task.id}\n"
+                f"├───────────────────────── 性能统计 ────────────────────────┤\n"
+                f"│ ASR 耗时: {asr_duration:.2f}s\n"
+                f"│ LLM 耗时: {llm_duration:.2f}s\n"
+                f"│ TTS 耗时: {tts_duration:.2f}s\n"
+                f"│ 音频长度: {audio_duration:.2f}s\n"
+                f"├───────────────────────── 生成内容 ────────────────────────┤\n"
+                f"│-> {voice_task.answer_sentence}\n"
+                f"└──────────────────────────────────────────────────────────┘"
+            )
+        except Exception as e:
+            logger.error(f"记录任务性能信息时出错: {e}")

src/voice_dialogue/services/speech/monitor.py CHANGED Viewed

@@ -20,6 +20,7 @@ from voice_dialogue.core.constants import (
 from voice_dialogue.core.enums import AudioState
 from voice_dialogue.models.voice_task import VoiceTask
 from voice_dialogue.services.audio.vad import SileroVAD
 from voice_dialogue.utils.logger import logger
@@ -110,7 +111,7 @@ class SpeechStateMonitor(BaseThread):
     def _normalize_audio_frame(self, data: bytes) -> np.ndarray:
         """将 int16 格式的音频字节数据转换为 [-1.0, 1.0] 范围的 numpy 浮点数组。"""
-        return np.frombuffer(data, dtype=np.int16).astype(np.float32) / np.iinfo(np.int16).max
     def _detect_speech(self, audio_frame: np.ndarray) -> bool:
         return self._vad_instance.is_voice_active(audio_frame, self.sample_rate)
@@ -119,11 +120,11 @@ class SpeechStateMonitor(BaseThread):
         """从队列获取音频帧"""
         try:
             if self._enable_vad:
-                data = self.audio_frame_queue.get(block=False, timeout=self.config.QUEUE_TIMEOUT)
                 audio_frame = self._normalize_audio_frame(data)
                 is_voice_active = self._detect_speech(audio_frame)
             else:
-                data, is_voice_active = self.audio_frame_queue.get(block=False, timeout=self.config.QUEUE_TIMEOUT)
                 audio_frame = self._normalize_audio_frame(data)
             return audio_frame, is_voice_active
         except Empty:
@@ -131,7 +132,7 @@ class SpeechStateMonitor(BaseThread):
     def _calculate_frame_duration_ms(self, audio_frame):
         """计算音频帧时长（毫秒）"""
-        return librosa.get_duration(y=audio_frame, sr=self.sample_rate) * 1000
     def _process_active_voice_frame(self, audio_frame: np.ndarray):
         """

 from voice_dialogue.core.enums import AudioState
 from voice_dialogue.models.voice_task import VoiceTask
 from voice_dialogue.services.audio.vad import SileroVAD
+from voice_dialogue.services.utils import normalize_audio_frame, calculate_audio_duration
 from voice_dialogue.utils.logger import logger
     def _normalize_audio_frame(self, data: bytes) -> np.ndarray:
         """将 int16 格式的音频字节数据转换为 [-1.0, 1.0] 范围的 numpy 浮点数组。"""
+        return normalize_audio_frame(data)
     def _detect_speech(self, audio_frame: np.ndarray) -> bool:
         return self._vad_instance.is_voice_active(audio_frame, self.sample_rate)
         """从队列获取音频帧"""
         try:
             if self._enable_vad:
+                data = self.audio_frame_queue.get(block=True, timeout=self.config.QUEUE_TIMEOUT)
                 audio_frame = self._normalize_audio_frame(data)
                 is_voice_active = self._detect_speech(audio_frame)
             else:
+                data, is_voice_active = self.audio_frame_queue.get(block=True, timeout=self.config.QUEUE_TIMEOUT)
                 audio_frame = self._normalize_audio_frame(data)
             return audio_frame, is_voice_active
         except Empty:
     def _calculate_frame_duration_ms(self, audio_frame):
         """计算音频帧时长（毫秒）"""
+        return calculate_audio_duration(audio_data=audio_frame, sample_rate=self.sample_rate) * 1000
     def _process_active_voice_frame(self, audio_frame: np.ndarray):
         """

src/voice_dialogue/services/speech/recognizer.py CHANGED Viewed

@@ -33,7 +33,7 @@ class ASRWorker(BaseThread):
         while not self.is_exited:
             try:
-                voice_task: VoiceTask = self.user_voice_queue.get(block=False, timeout=1)
             except Empty:
                 continue

         while not self.is_exited:
             try:
+                voice_task: VoiceTask = self.user_voice_queue.get(block=True, timeout=1)
             except Empty:
                 continue

src/voice_dialogue/services/text/generator.py CHANGED Viewed

@@ -214,7 +214,7 @@ class LLMResponseGenerator(BaseThread):
         """主运行循环"""
         while not self.is_exited:
             try:
-                voice_task: VoiceTask = self.user_question_queue.get(block=False, timeout=1)
                 self._process_voice_task(voice_task)
             except Empty:
                 continue

         """主运行循环"""
         while not self.is_exited:
             try:
+                voice_task: VoiceTask = self.user_question_queue.get(block=True, timeout=1)
                 self._process_voice_task(voice_task)
             except Empty:
                 continue

src/voice_dialogue/services/utils.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import re
+import librosa
+import numpy as np
+from voice_dialogue.utils.logger import logger
+def has_no_words(text: str) -> bool:
+    """
+    检查文本是否不包含任何单词（字母、数字或中文字符）。
+    如果文本只包含标点、空格等符号，则返回 True。
+    """
+    if not text:
+        return True
+    # 搜索任何字母、数字或中文字符
+    if re.search(r'[\u4e00-\u9fa5a-zA-Z0-9]', text):
+        return False
+    return True
+def normalize_audio_frame(data: bytes) -> 'np.ndarray':
+    """
+    将 int16 格式的音频字节数据转换为 [-1.0, 1.0] 范围的 numpy 浮点数组。
+    """
+    return np.frombuffer(data, dtype=np.int16).astype(np.float32) / np.iinfo(np.int16).max
+def calculate_audio_duration(audio_data: 'np.ndarray', sample_rate: int = 16000) -> float:
+    """
+    计算音频时长（秒）。
+    """
+    try:
+        return librosa.get_duration(y=audio_data, sr=sample_rate)
+    except Exception as e:
+        logger.error(f"计算音频时长时发生错误: {e}")
+        return 0.0