zenlm
/

zen-translator

@@ -1,24 +1,57 @@
 # Zen Translator
 Real-time multimodal translation with voice cloning and lip synchronization.
-Built on:
-- **[Qwen3-Omni](https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct)** - Real-time speech understanding and translation
-- **[CosyVoice 2.0](https://github.com/FunAudioLLM/CosyVoice)** - Ultra-low latency voice cloning (150ms)
-- **[Wav2Lip](https://github.com/Rudrabha/Wav2Lip)** - Accurate lip synchronization
 ## Features
-- 🌐 **18 input languages**, 10 output languages
-- 🎙️ **3-second voice cloning** - Preserve speaker characteristics
-- 👄 **Accurate lip sync** - Natural video dubbing
-- ⚡ **<1 second latency** - Real-time streaming
-- 📺 **News anchor optimization** - Domain-specific finetuning
 ## Quick Start
-### Installation
 ```bash
 # Clone repository
 git clone https://github.com/zenlm/zen-translator.git
@@ -27,24 +60,67 @@ cd zen-translator
 # Install with uv
 make install
-# Download models (requires ~100GB disk space)
 make download
 ```
-### Usage
-**Translate a video:**
 ```bash
 zen-translate video.mp4 -o translated.mp4 -t spanish
 ```
-**Start the API server:**
 ```bash
-make serve
-# Server runs at http://localhost:8000
 ```
-**Real-time WebSocket translation:**
 ```javascript
 const ws = new WebSocket('ws://localhost:8000/ws/translate');
 ws.send(JSON.stringify({ target_lang: 'es', speaker_id: 'my_voice' }));
@@ -54,63 +130,59 @@ ws.onmessage = (event) => {
 };
 ```
-## Architecture
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                      Zen Translator Pipeline                     │
-├─────────────────┬─────────────────┬─────────────────────────────┤
-│  Audio/Video    │  Qwen3-Omni     │  Translation + Understanding │
-│  Input          │  (30B MoE)      │  ~500ms                       │
-├─────────────────┼─────────────────┼─────────────────────────────┤
-│  Translated     │  CosyVoice 2.0  │  Voice Cloning               │
-│  Text           │  (0.5B)         │  ~150ms                       │
-├─────────────────┼─────────────────┼─────────────────────────────┤
-│  Cloned Audio   │  Wav2Lip        │  Lip Synchronization         │
-│  + Video        │                 │  ~200ms                       │
-├─────────────────┴─────────────────┴─────────────────────────────┤
-│  Total End-to-End Latency: <1 second                            │
-└─────────────────────────────────────────────────────────────────┘
-```
-## Supported Languages
-### Input (18 + 6 dialects)
-English, Chinese, Japanese, Korean, Spanish, French, German, Italian, Portuguese, Russian, Arabic, Hindi, Thai, Vietnamese, Indonesian, Malay, Turkish, Polish, Cantonese, Shanghainese, and more.
-### Output (10)
 English, Chinese, Japanese, Korean, Spanish, French, German, Italian, Portuguese, Russian
-## Voice Cloning
-Register a speaker with just 3 seconds of audio:
-```python
-from zen_translator import TranslationPipeline
-pipeline = TranslationPipeline()
-await pipeline.load()
-# Register speaker
-await pipeline.register_speaker(
-    speaker_id="john_doe",
-    reference_audio="reference.wav"
-)
-# Translate with cloned voice
-result = await pipeline.translate_audio(
-    audio="input.wav",
-    target_lang="es",
-    speaker_id="john_doe"
-)
-```
-## News Anchor Training
-Finetune for accurate news translation:
 ```bash
-# Build dataset from news channels
 make dataset-build
 # Train news anchor adaptation
@@ -120,103 +192,23 @@ make train-anchor
 swift sft --config outputs/anchor/train_config.yaml
 ```
-Supported news sources:
-- CNN, BBC News, NHK World, DW News
-- France24, Al Jazeera, Sky News, Reuters
-- CCTV, TBS, KBS, and more
-## API Reference
-### REST Endpoints
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/translate/audio` | POST | Translate audio file |
-| `/translate/video` | POST | Translate video with lip sync |
-| `/speakers/register` | POST | Register voice for cloning |
-| `/speakers` | GET | List registered speakers |
-| `/languages` | GET | Get supported languages |
-| `/ws/translate` | WS | Real-time streaming translation |
-### Python API
-```python
-from zen_translator import TranslationPipeline, TranslatorConfig
-# Configure
-config = TranslatorConfig(
-    target_language="es",
-    enable_lip_sync=True,
-    preserve_emotion=True,
-)
-# Initialize
-pipeline = TranslationPipeline(config)
-await pipeline.load()
-# Translate video
-result = await pipeline.translate_video(
-    video="news_clip.mp4",
-    output_path="translated.mp4",
-)
-```
-## Model Requirements
-| Model | Parameters | VRAM | Disk |
-|-------|------------|------|------|
-| Qwen3-Omni | 30B (3B active) | 16GB | 60GB |
-| CosyVoice 2.0 | 0.5B | 2GB | 1GB |
-| Wav2Lip | ~100M | 2GB | 500MB |
-| **Total** | - | **~20GB** | **~62GB** |
-For smaller deployments, use quantized models:
-```bash
-make download-quantized  # 4-bit Qwen3-Omni (~15GB)
-```
-## Development
-```bash
-# Install dev dependencies
-make dev
-# Run tests
-make test
-# Lint and format
-make lint format
-# Type check
-make typecheck
 ```
-## Configuration
-Environment variables:
-```bash
-export ZEN_TRANSLATOR_TARGET_LANGUAGE=es
-export ZEN_TRANSLATOR_DEVICE=cuda
-export ZEN_TRANSLATOR_DTYPE=bfloat16
-export ZEN_TRANSLATOR_ENABLE_LIP_SYNC=true
-```
-Or use `.env` file in project root.
 ## License
 Apache 2.0
-## Credits
-- **Qwen Team** - Qwen3-Omni model
-- **Alibaba FunAudioLLM** - CosyVoice
-- **Wav2Lip Authors** - Lip synchronization
-- **Hanzo AI / Zen LM** - Integration and finetuning
-## Links
-- [Zen LM](https://zenlm.org)
-- [Qwen3-Omni](https://huggingface.co/collections/Qwen/qwen3-omni)
-- [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)
-- [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)

+---
+license: apache-2.0
+language:
+  - en
+  - zh
+  - ja
+  - ko
+  - es
+  - fr
+  - de
+  - it
+  - pt
+  - ru
+library_name: transformers
+pipeline_tag: audio-to-audio
+tags:
+  - translation
+  - voice-cloning
+  - lip-sync
+  - multimodal
+  - real-time
+  - qwen3-omni
+  - cosyvoice
+  - wav2lip
+  - hanzo-ai
+  - zen-lm
+---
 # Zen Translator
 Real-time multimodal translation with voice cloning and lip synchronization.
+## Overview
+Zen Translator combines three state-of-the-art models into a sub-second end-to-end pipeline:
+| Component | Model | Parameters | Latency |
+|-----------|-------|------------|---------|
+| Translation | [Qwen3-Omni-30B-A3B](https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct) | 30B (3B active MoE) | ~500ms |
+| Voice Cloning | [CosyVoice 2.0](https://github.com/FunAudioLLM/CosyVoice) | 0.5B | ~150ms |
+| Lip Sync | [Wav2Lip](https://github.com/Rudrabha/Wav2Lip) | ~100M | ~200ms |
+| **Total** | - | - | **<1 second** |
 ## Features
+- **18 input languages** including Chinese dialects (Cantonese, Shanghainese, etc.)
+- **10 output languages** with high-quality voice synthesis
+- **3-second voice cloning** - Preserve speaker characteristics with minimal reference audio
+- **Real-time streaming** - WebSocket API with <500ms first packet latency
+- **Lip synchronization** - Natural video dubbing for translated content
+- **News anchor training** - Domain-specific finetuning for broadcast translation
 ## Quick Start
 ```bash
 # Clone repository
 git clone https://github.com/zenlm/zen-translator.git
 # Install with uv
 make install
+# Download models (~62GB full, ~16GB quantized)
 make download
+# OR
+make download-quantized
+# Start server
+make serve
+```
+## Usage
+### Python API
+```python
+from zen_translator import TranslationPipeline, TranslatorConfig
+config = TranslatorConfig(target_language="es")
+pipeline = TranslationPipeline(config)
+await pipeline.load()
+# Register speaker voice (3+ seconds of audio)
+await pipeline.register_speaker("john_doe", "reference.wav")
+# Translate video with voice cloning and lip sync
+result = await pipeline.translate_video(
+    video="news.mp4",
+    target_lang="es",
+    speaker_id="john_doe",
+    output_path="news_es.mp4"
+)
 ```
+### CLI
 ```bash
+# Translate a video
 zen-translate video.mp4 -o translated.mp4 -t spanish
+# Register a speaker
+zen-translate register-speaker john_doe reference.wav
+# Start the API server
+zen-serve --host 0.0.0.0 --port 8000
 ```
+### REST API
 ```bash
+# Translate audio
+curl -X POST http://localhost:8000/translate/audio \
+  -F "audio=@input.wav" \
+  -F "target_lang=es"
+# Translate video with lip sync
+curl -X POST http://localhost:8000/translate/video \
+  -F "video=@input.mp4" \
+  -F "target_lang=zh"
 ```
+### WebSocket (Real-time)
 ```javascript
 const ws = new WebSocket('ws://localhost:8000/ws/translate');
 ws.send(JSON.stringify({ target_lang: 'es', speaker_id: 'my_voice' }));
 };
 ```
+## Language Support
+### Input Languages (18 + 6 dialects)
+| Language | Code |
+|----------|------|
+| English | en |
+| Chinese | zh |
+| Japanese | ja |
+| Korean | ko |
+| Spanish | es |
+| French | fr |
+| German | de |
+| Italian | it |
+| Portuguese | pt |
+| Russian | ru |
+| Arabic | ar |
+| Hindi | hi |
+| Thai | th |
+| Vietnamese | vi |
+| Indonesian | id |
+| Malay | ms |
+| Turkish | tr |
+| Polish | pl |
+| **Dialects** | |
+| Cantonese | yue |
+| Shanghainese | wuu |
+| Xiang | hsn |
+| Min Nan | nan |
+| Hakka | hak |
+| Min Dong | cdo |
+### Output Languages (10)
 English, Chinese, Japanese, Korean, Spanish, French, German, Italian, Portuguese, Russian
+## Model Requirements
+| Model | VRAM | Disk |
+|-------|------|------|
+| Qwen3-Omni | 16GB | 60GB |
+| CosyVoice 2.0 | 2GB | 1GB |
+| Wav2Lip | 2GB | 500MB |
+| **Total** | **~20GB** | **~62GB** |
+For smaller deployments, use 4-bit quantized Qwen3-Omni (~15GB disk).
+## Training
+### News Anchor Adaptation
 ```bash
+# Build dataset from news channels (CNN, BBC, NHK, DW)
 make dataset-build
 # Train news anchor adaptation
 swift sft --config outputs/anchor/train_config.yaml
 ```
+## Citation
+```bibtex
+@software{zen_translator,
+  author = {Hanzo AI and Zen LM},
+  title = {Zen Translator: Real-time Multimodal Translation with Voice Cloning},
+  year = {2025},
+  url = {https://github.com/zenlm/zen-translator}
+}
 ```
+## Links
+- **GitHub**: https://github.com/zenlm/zen-translator
+- **Zen LM**: https://zenlm.org
+- **Hanzo AI**: https://hanzo.ai
 ## License
 Apache 2.0