Instructions to use MoYoYoTech/Translator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/Translator with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/Translator",
	filename="moyoyo_asr_models/qwen2.5-1.5b-instruct-q5_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/Translator with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/Translator:Q5_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/Translator:Q5_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/Translator:Q5_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/Translator:Q5_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/Translator:Q5_0

Use Docker

docker model run hf.co/MoYoYoTech/Translator:Q5_0

LM Studio
Jan
Ollama
How to use MoYoYoTech/Translator with Ollama:
```
ollama run hf.co/MoYoYoTech/Translator:Q5_0
```

Unsloth Studio

How to use MoYoYoTech/Translator with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/Translator to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/Translator to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/Translator to start chatting

How to use MoYoYoTech/Translator with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/Translator:Q5_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/Translator:Q5_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/Translator with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/Translator:Q5_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/Translator:Q5_0

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/Translator with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/Translator:Q5_0
```

Lemonade

How to use MoYoYoTech/Translator with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/Translator:Q5_0

Run and chat with the model

lemonade run user.Translator-Q5_0

List all available models

lemonade list

daihui.zhang commited on Apr 17, 2025

Commit

f5bdb50

1 Parent(s): d84bca3

fix vad buf

Browse files

Files changed (4) hide show

main.py +3 -0
transcribe/pipelines/pipe_vad.py +54 -1
transcribe/translatepipes.py +3 -0
transcribe/whisper_llm_serve.py +17 -13

main.py CHANGED Viewed

@@ -57,6 +57,8 @@ async def root():
 async def translate(websocket: WebSocket):
     query_parameters_dict = websocket.query_params
     from_lang, to_lang = query_parameters_dict.get('from'), query_parameters_dict.get('to')
     client = WhisperTranscriptionService(
         websocket,
         pipe,
@@ -64,6 +66,7 @@ async def translate(websocket: WebSocket):
         client_uid=f"{uuid1()}",
     )
     if from_lang and to_lang:
         client.set_language(from_lang, to_lang)
         logger.info(f"Source lange: {from_lang}  -> Dst lange: {to_lang}")

 async def translate(websocket: WebSocket):
     query_parameters_dict = websocket.query_params
     from_lang, to_lang = query_parameters_dict.get('from'), query_parameters_dict.get('to')
+    pipe.reset()
     client = WhisperTranscriptionService(
         websocket,
         pipe,
         client_uid=f"{uuid1()}",
     )
     if from_lang and to_lang:
         client.set_language(from_lang, to_lang)
         logger.info(f"Source lange: {from_lang}  -> Dst lange: {to_lang}")

transcribe/pipelines/pipe_vad.py CHANGED Viewed

@@ -56,8 +56,18 @@ class VadPipe(BasePipe):
     model = None
     sample_rate = 16000
     window_size_samples = 512
     @classmethod
     def init(cls):
         if cls.model is None:
@@ -81,9 +91,52 @@ class VadPipe(BasePipe):
     # def reduce_noise(self, data):
     #     return nr.reduce_noise(y=data, sr=self.sample_rate)
     def process(self, in_data: MetaItem) -> MetaItem:
         source_audio = in_data.source_audio
         source_audio = np.frombuffer(source_audio, dtype=np.float32)
         # source_audio = self.reduce_noise(source_audio)

     model = None
     sample_rate = 16000
     window_size_samples = 512
+    chunk_size = 512
+    def __init__(self, in_queue=None, out_queue=None) -> None:
+        super().__init__(in_queue, out_queue)
+        self._offset = 0 # 处理的frame size offset
+        self._status = 'END'
+    def reset(self):
+        self._offset = 0
+        self._status = 'END'
     @classmethod
     def init(cls):
         if cls.model is None:
     # def reduce_noise(self, data):
     #     return nr.reduce_noise(y=data, sr=self.sample_rate)
+    def _process_speech_chunk(self, source_audio:np.ndarray):
+        speech_dict = self.vac(source_audio, return_seconds=False)
+        if speech_dict:
+            start_frame, end_frame = speech_dict.get("start"), speech_dict.get("end")
+            if start_frame:
+                relative_start_frame = max(0, (start_frame - self._offset))
+            if end_frame:
+                relative_end_frame = min((end_frame+1 - self._offset),len(source_audio))
+            return relative_start_frame, relative_end_frame
     def process(self, in_data: MetaItem) -> MetaItem:
+        if self._offset == 0:
+            self.vac.reset_states()
+        source_audio = np.frombuffer(in_data.source_audio, dtype=np.float32)
+        speech_data  = self._process_iter_chunk(source_audio)
+        self._offset += len(source_audio)
+        if speech_data: # 表示有音频的变化点出现
+            rel_start_frame, rel_end_frame = speech_data
+            if rel_start_frame and not rel_end_frame:
+                self._status = "START" # 语音开始
+                target_audio = source_audio[rel_start_frame:]
+            elif not rel_start_frame and rel_end_frame:
+                self._status = "END" # 音频结束
+                target_audio = source_audio[:rel_end_frame]
+            elif rel_start_frame and rel_end_frame:
+                self._status = 'END'
+                target_audio = source_audio[rel_start_frame:rel_end_frame]
+            else:
+                self._status = 'END'
+                target_audio = np.array([],dtype=np.float32)
+        else:
+            if self._status == 'START':
+                target_audio = source_audio
+            else: # end
+                target_audio = np.array([],dtype=np.float32)
+        in_data.audio = target_audio.tobytes()
+        in_data.source_audio = b''
+        return in_data
+    def process_all(self, in_data: MetaItem) -> MetaItem:
         source_audio = in_data.source_audio
         source_audio = np.frombuffer(source_audio, dtype=np.float32)
         # source_audio = self.reduce_noise(source_audio)

transcribe/translatepipes.py CHANGED Viewed

@@ -19,6 +19,9 @@ class TranslatePipes:
         self._translate_7b_pipe = self._launch_process(Translate7BPipe())
         # vad
         self._vad_pipe = self._launch_process(VadPipe())
     def _launch_process(self, process_obj):
         process_obj.daemon = True

         self._translate_7b_pipe = self._launch_process(Translate7BPipe())
         # vad
         self._vad_pipe = self._launch_process(VadPipe())
+    def reset(self):
+        self._vad_pipe.reset()
     def _launch_process(self, process_obj):
         process_obj.daemon = True

transcribe/whisper_llm_serve.py CHANGED Viewed

@@ -54,6 +54,9 @@ class WhisperTranscriptionService(ServeClientBase):
         self.translate_thread = self._start_thread(self._transcription_processing_loop)
         self.frame_processing_thread = self._start_thread(self._frame_processing_loop)
         # for test
         self._transcrible_time_cost = 0.
         self._translate_time_cost = 0.
@@ -106,8 +109,11 @@ class WhisperTranscriptionService(ServeClientBase):
         while not self._frame_processing_thread_stop.is_set():
             try:
                 frame_np = self._frame_queue.get(timeout=0.1)
                 if frame_np is None:
                     logger.error("Received None frame, stopping thread")
                 with self.lock:
                     if self.frames_np is None:
                         self.frames_np = frame_np.copy()
@@ -116,18 +122,16 @@ class WhisperTranscriptionService(ServeClientBase):
             except queue.Empty:
                 pass
-    def _apply_voice_activity_detection(self) -> None:
         """应用语音活动检测来优化音频缓冲区"""
-        with self.lock:
-            if self.frames_np is not None:
-                # self._c+= 1
-                frame = self.frames_np.copy()
-                processed_audio = self._translate_pipe.voice_detect(frame.tobytes())
-                self.frames_np = np.frombuffer(processed_audio.audio, dtype=np.float32).copy()
-                return self.frames_np.copy()
-                # if len(frame) > self.sample_rate:
-                #     save_to_wave(f"{self._c}-org.wav", frame)
-                #     save_to_wave(f"{self._c}-vad.wav", self.frames_np)
     def _update_audio_buffer(self, offset: int) -> None:
         """从音频缓冲区中移除已处理的部分"""
@@ -145,8 +149,8 @@ class WhisperTranscriptionService(ServeClientBase):
     def _get_audio_for_processing(self) -> Optional[np.ndarray]:
         """准备用于处理的音频块"""
         # 应用VAD处理
-        frame_np = self._apply_voice_activity_detection()
-        # frame_np = self.frames_np.copy()
         # 没有音频帧
         if frame_np is None:
             return None

         self.translate_thread = self._start_thread(self._transcription_processing_loop)
         self.frame_processing_thread = self._start_thread(self._frame_processing_loop)
+        #
+        self._vad_processed_offset = 0
         # for test
         self._transcrible_time_cost = 0.
         self._translate_time_cost = 0.
         while not self._frame_processing_thread_stop.is_set():
             try:
                 frame_np = self._frame_queue.get(timeout=0.1)
+                frame_np = self._apply_voice_activity_detection(frame_np)
                 if frame_np is None:
                     logger.error("Received None frame, stopping thread")
+                # apply vad speech check:
                 with self.lock:
                     if self.frames_np is None:
                         self.frames_np = frame_np.copy()
             except queue.Empty:
                 pass
+    def _apply_voice_activity_detection(self, frame_np:np.array) -> None:
         """应用语音活动检测来优化音频缓冲区"""
+        # self._c+= 1
+        processed_audio = self._translate_pipe.voice_detect(frame_np.tobytes())
+        speech_audio =  np.frombuffer(processed_audio.audio, dtype=np.float32)
+        # if speech_audio:
+        # if len(frame) > self.sample_rate:
+        #     save_to_wave(f"{self._c}-org.wav", frame)
+        #     save_to_wave(f"{self._c}-vad.wav", self.frames_np)
+        return speech_audio
     def _update_audio_buffer(self, offset: int) -> None:
         """从音频缓冲区中移除已处理的部分"""
     def _get_audio_for_processing(self) -> Optional[np.ndarray]:
         """准备用于处理的音频块"""
         # 应用VAD处理
+        # frame_np = self._apply_voice_activity_detection()
+        frame_np = self.frames_np.copy()
         # 没有音频帧
         if frame_np is None:
             return None