File size: 2,225 Bytes
3de0fd6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: mit
tags:
- text-to-speech
- audio
- speech
language:
- en
pipeline_tag: text-to-speech
model-index:
- name: VibeVoice-1.5B
  results: []
---


# VibeVoice-1.5B

VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.

## Repository

Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/)

## Requirements

* Python 3.8+
* PyTorch (with CUDA support recommended)
* [Transformers](https://github.com/huggingface/transformers)
* FFmpeg (for audio processing)

## Installation

Clone the repository and install dependencies:

```bash
# Clone the repository
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b

# Change directory
%cd /content/vibevoice-1.5b

# Install in editable mode
!pip install -e .

# Install ffmpeg for audio handling
!apt update && apt install ffmpeg -y
```

## Usage

Run inference using the provided demo script:

```bash
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
  --model_path /content/vibevoice-1.5b \
  --txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
  --speaker_names Alice
```

### Arguments

* `--model_path`: Path to the model directory (local or Hugging Face repo name).
* `--txt_path`: Path to a text file containing the input text.
* `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported).

### Example with multiple speakers

```bash
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
  --model_path /content/vibevoice-1.5b \
  --txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
  --speaker_names Alice Frank
```

## Google Colab Notebook

A ready-to-use Google Colab notebook is available for quick experimentation:

[Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing)

## Output

* Generated audio files will be saved in the output directory specified in the script.
* Default output format: `.wav`

## License

Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.