zenlm
/

zen3-audio-fast

@@ -1,96 +1,56 @@
 ---
-library_name: transformers
-pipeline_tag: text-to-speech
-language:
-  - en
-  - zh
-  - multilingual
 license: apache-2.0
 tags:
-  - text-to-speech
-  - tts
-  - speech-synthesis
   - zen
-  - zen3
   - zenlm
   - hanzo
 ---
 # Zen3 Audio Fast
-**Zen LM by Hanzo AI** — Ultra-fast streaming text-to-speech synthesis engine.
-## Specs
-| Property | Value |
-|----------|-------|
-| Parameters | ~1.8B (flow: 420M, hift: 82M, llm: 1.25B) |
-| Architecture | Zen Audio Streaming Architecture |
-| Generation | Zen3 |
-| Task | Text-to-Speech |
-| Sample Rate | 24 kHz |
-| Languages | English, Chinese, Multilingual |
-| Latency | Ultra-low (streaming) |
-## Model Files
-This repository contains three PyTorch checkpoint components:
-| File | Role | Size |
-|------|------|------|
-| `llm.pt` | Language model backbone | ~1.25B params |
-| `flow.pt` | Acoustic flow matching model | ~420M params |
-| `hift.pt` | High-fidelity vocoder | ~82M params |
-## API Access (Recommended)
-The easiest way to use Zen3 Audio Fast is through the Hanzo AI API:
-```python
-from openai import OpenAI
-client = OpenAI(
-    base_url='https://api.hanzo.ai/v1',
-    api_key='your-api-key',
-)
-response = client.audio.speech.create(
-    model='zen3-audio-fast',
-    input='Hello, welcome to Hanzo AI!',
-    voice='alloy',
-)
-response.stream_to_file('output.mp3')
-```
-## Local Usage
 ```python
 import torch
-from pathlib import Path
-# Load model components
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
-llm = torch.load('llm.pt', map_location=device, weights_only=False)
-flow = torch.load('flow.pt', map_location=device, weights_only=False)
-hift = torch.load('hift.pt', map_location=device, weights_only=False)
 ```
-See [github.com/zenlm/zen-audio](https://github.com/zenlm/zen-audio) for the full inference pipeline and configuration reference (`model_config.yaml`).
-## Configuration
-The `model_config.yaml` file in this repository contains the full model configuration including:
-- Sample rate and audio processing parameters
-- Model architecture hyperparameters
-- Tokenizer and embedding settings
-## Related Models
-| Model | Description |
-|-------|-------------|
-| [zenlm/zen3-audio](https://huggingface.co/zenlm/zen3-audio) | Full-quality audio model |
-| [zenlm/zen-translator](https://huggingface.co/zenlm/zen-translator) | Speech translation variant |
 ## License

 ---
+language: en
 license: apache-2.0
 tags:
+  - audio-to-audio
   - zen
   - zenlm
   - hanzo
+  - zen3
+  - speech
+  - audio
+  - tts
+  - fast
+pipeline_tag: audio-to-audio
+library_name: transformers
 ---
 # Zen3 Audio Fast
+Fast variant of Zen3 Audio optimized for low-latency speech synthesis.
+## Overview
+Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 500M parameters.
+Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo).
+## Quick Start
 ```python
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
 import torch
+model_id = "zenlm/zen3-audio-fast"
+processor = AutoProcessor.from_pretrained(model_id)
+model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
+# Load audio
+import librosa
+audio, sr = librosa.load("audio.wav", sr=16000)
+inputs = processor(audio, sampling_rate=sr, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs)
+print(processor.batch_decode(outputs, skip_special_tokens=True)[0])
 ```
+## Model Details
+| Attribute | Value |
+|-----------|-------|
+| Parameters | 500M |
+| Architecture | Zen MoDE |
+| Context | 30s audio |
+| License | Apache 2.0 |
 ## License