| # VibeVoice 7B - 4-bit Quantized | |
| Optimized for RTX 3060/4060 and similar 12GB VRAM GPUs. | |
| ## Specifications | |
| - Quantization: 4-bit (nf4) | |
| - Model size: 6.2 GB | |
| - VRAM usage: ~8 GB | |
| - Quality: Very good (minimal degradation) | |
| ## Usage | |
| ```python | |
| from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference | |
| from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor | |
| model = VibeVoiceForConditionalGenerationInference.from_pretrained( | |
| "Dannidee/VibeVoice7b-low-vram/4bit", | |
| device_map='cuda', | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| processor = VibeVoiceProcessor.from_pretrained("Dannidee/VibeVoice7b-low-vram/4bit") | |
| # Generate speech | |
| text = "Speaker 1: Hello! Speaker 2: Hi there!" | |
| inputs = processor( | |
| text=[text], | |
| voice_samples=[["voice1.wav", "voice2.wav"]], | |
| padding=True, | |
| return_tensors="pt", | |
| ) | |
| outputs = model.generate(**inputs) | |
| processor.save_audio(outputs.speech_outputs[0], "output.wav") | |
| ``` | |