zeekay commited on
Commit
63e1209
·
verified ·
1 Parent(s): 01e6417

Update model card: add zen/zenlm tags, fix branding

Browse files
Files changed (1) hide show
  1. README.md +31 -71
README.md CHANGED
@@ -1,96 +1,56 @@
1
  ---
2
- library_name: transformers
3
- pipeline_tag: text-to-speech
4
- language:
5
- - en
6
- - zh
7
- - multilingual
8
  license: apache-2.0
9
  tags:
10
- - text-to-speech
11
- - tts
12
- - speech-synthesis
13
  - zen
14
- - zen3
15
  - zenlm
16
  - hanzo
 
 
 
 
 
 
 
17
  ---
18
 
19
  # Zen3 Audio Fast
20
 
21
- **Zen LM by Hanzo AI** Ultra-fast streaming text-to-speech synthesis engine.
22
-
23
- ## Specs
24
-
25
- | Property | Value |
26
- |----------|-------|
27
- | Parameters | ~1.8B (flow: 420M, hift: 82M, llm: 1.25B) |
28
- | Architecture | Zen Audio Streaming Architecture |
29
- | Generation | Zen3 |
30
- | Task | Text-to-Speech |
31
- | Sample Rate | 24 kHz |
32
- | Languages | English, Chinese, Multilingual |
33
- | Latency | Ultra-low (streaming) |
34
-
35
- ## Model Files
36
 
37
- This repository contains three PyTorch checkpoint components:
38
 
39
- | File | Role | Size |
40
- |------|------|------|
41
- | `llm.pt` | Language model backbone | ~1.25B params |
42
- | `flow.pt` | Acoustic flow matching model | ~420M params |
43
- | `hift.pt` | High-fidelity vocoder | ~82M params |
44
 
45
- ## API Access (Recommended)
46
 
47
- The easiest way to use Zen3 Audio Fast is through the Hanzo AI API:
48
-
49
- ```python
50
- from openai import OpenAI
51
-
52
- client = OpenAI(
53
- base_url='https://api.hanzo.ai/v1',
54
- api_key='your-api-key',
55
- )
56
-
57
- response = client.audio.speech.create(
58
- model='zen3-audio-fast',
59
- input='Hello, welcome to Hanzo AI!',
60
- voice='alloy',
61
- )
62
- response.stream_to_file('output.mp3')
63
- ```
64
-
65
- ## Local Usage
66
 
67
  ```python
 
68
  import torch
69
- from pathlib import Path
70
 
71
- # Load model components
72
- device = 'cuda' if torch.cuda.is_available() else 'cpu'
 
73
 
74
- llm = torch.load('llm.pt', map_location=device, weights_only=False)
75
- flow = torch.load('flow.pt', map_location=device, weights_only=False)
76
- hift = torch.load('hift.pt', map_location=device, weights_only=False)
 
 
 
77
  ```
78
 
79
- See [github.com/zenlm/zen-audio](https://github.com/zenlm/zen-audio) for the full inference pipeline and configuration reference (`model_config.yaml`).
80
-
81
- ## Configuration
82
-
83
- The `model_config.yaml` file in this repository contains the full model configuration including:
84
- - Sample rate and audio processing parameters
85
- - Model architecture hyperparameters
86
- - Tokenizer and embedding settings
87
-
88
- ## Related Models
89
 
90
- | Model | Description |
91
- |-------|-------------|
92
- | [zenlm/zen3-audio](https://huggingface.co/zenlm/zen3-audio) | Full-quality audio model |
93
- | [zenlm/zen-translator](https://huggingface.co/zenlm/zen-translator) | Speech translation variant |
 
 
94
 
95
  ## License
96
 
 
1
  ---
2
+ language: en
 
 
 
 
 
3
  license: apache-2.0
4
  tags:
5
+ - audio-to-audio
 
 
6
  - zen
 
7
  - zenlm
8
  - hanzo
9
+ - zen3
10
+ - speech
11
+ - audio
12
+ - tts
13
+ - fast
14
+ pipeline_tag: audio-to-audio
15
+ library_name: transformers
16
  ---
17
 
18
  # Zen3 Audio Fast
19
 
20
+ Fast variant of Zen3 Audio optimized for low-latency speech synthesis.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ## Overview
23
 
24
+ Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 500M parameters.
 
 
 
 
25
 
26
+ Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo).
27
 
28
+ ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ```python
31
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
32
  import torch
 
33
 
34
+ model_id = "zenlm/zen3-audio-fast"
35
+ processor = AutoProcessor.from_pretrained(model_id)
36
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
37
 
38
+ # Load audio
39
+ import librosa
40
+ audio, sr = librosa.load("audio.wav", sr=16000)
41
+ inputs = processor(audio, sampling_rate=sr, return_tensors="pt").to(model.device)
42
+ outputs = model.generate(**inputs)
43
+ print(processor.batch_decode(outputs, skip_special_tokens=True)[0])
44
  ```
45
 
46
+ ## Model Details
 
 
 
 
 
 
 
 
 
47
 
48
+ | Attribute | Value |
49
+ |-----------|-------|
50
+ | Parameters | 500M |
51
+ | Architecture | Zen MoDE |
52
+ | Context | 30s audio |
53
+ | License | Apache 2.0 |
54
 
55
  ## License
56