| language: en | |
| license: apache-2.0 | |
| tags: | |
| - audio-to-audio | |
| - zen | |
| - zenlm | |
| - hanzo | |
| - zen3 | |
| - speech | |
| - audio | |
| - tts | |
| pipeline_tag: audio-to-audio | |
| library_name: transformers | |
| # Zen3 Audio | |
| Zen3 audio processing model for speech synthesis and audio understanding. | |
| ## Overview | |
| Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 1.5B parameters. | |
| Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo). | |
| ## Quick Start | |
| ```python | |
| from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor | |
| import torch | |
| model_id = "zenlm/zen3-audio" | |
| processor = AutoProcessor.from_pretrained(model_id) | |
| model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") | |
| # Load audio | |
| import librosa | |
| audio, sr = librosa.load("audio.wav", sr=16000) | |
| inputs = processor(audio, sampling_rate=sr, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs) | |
| print(processor.batch_decode(outputs, skip_special_tokens=True)[0]) | |
| ``` | |
| ## Model Details | |
| | Attribute | Value | | |
| |-----------|-------| | |
| | Parameters | 1.5B | | |
| | Architecture | Zen MoDE | | |
| | Context | 30s audio | | |
| | License | Apache 2.0 | | |
| ## License | |
| Apache 2.0 | |