File size: 1,004 Bytes
c304b36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# VibeVoice 7B - 4-bit Quantized

Optimized for RTX 3060/4060 and similar 12GB VRAM GPUs.

## Specifications
- Quantization: 4-bit (nf4)
- Model size: 6.2 GB
- VRAM usage: ~8 GB
- Quality: Very good (minimal degradation)

## Usage

```python

from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference

from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor



model = VibeVoiceForConditionalGenerationInference.from_pretrained(

    "Dannidee/VibeVoice7b-low-vram/4bit",

    device_map='cuda',

    torch_dtype=torch.bfloat16,

)

processor = VibeVoiceProcessor.from_pretrained("Dannidee/VibeVoice7b-low-vram/4bit")



# Generate speech

text = "Speaker 1: Hello! Speaker 2: Hi there!"

inputs = processor(

    text=[text],

    voice_samples=[["voice1.wav", "voice2.wav"]],

    padding=True,

    return_tensors="pt",

)



outputs = model.generate(**inputs)

processor.save_audio(outputs.speech_outputs[0], "output.wav")

```