File size: 2,225 Bytes
3de0fd6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: mit
tags:
- text-to-speech
- audio
- speech
language:
- en
pipeline_tag: text-to-speech
model-index:
- name: VibeVoice-1.5B
results: []
---
# VibeVoice-1.5B
VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.
## Repository
Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/)
## Requirements
* Python 3.8+
* PyTorch (with CUDA support recommended)
* [Transformers](https://github.com/huggingface/transformers)
* FFmpeg (for audio processing)
## Installation
Clone the repository and install dependencies:
```bash
# Clone the repository
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b
# Change directory
%cd /content/vibevoice-1.5b
# Install in editable mode
!pip install -e .
# Install ffmpeg for audio handling
!apt update && apt install ffmpeg -y
```
## Usage
Run inference using the provided demo script:
```bash
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
--model_path /content/vibevoice-1.5b \
--txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
--speaker_names Alice
```
### Arguments
* `--model_path`: Path to the model directory (local or Hugging Face repo name).
* `--txt_path`: Path to a text file containing the input text.
* `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported).
### Example with multiple speakers
```bash
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
--model_path /content/vibevoice-1.5b \
--txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
--speaker_names Alice Frank
```
## Google Colab Notebook
A ready-to-use Google Colab notebook is available for quick experimentation:
[Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing)
## Output
* Generated audio files will be saved in the output directory specified in the script.
* Default output format: `.wav`
## License
Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.
|