# VibeVoice 7B - 4-bit Quantized Optimized for RTX 3060/4060 and similar 12GB VRAM GPUs. ## Specifications - Quantization: 4-bit (nf4) - Model size: 6.2 GB - VRAM usage: ~8 GB - Quality: Very good (minimal degradation) ## Usage ```python from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor model = VibeVoiceForConditionalGenerationInference.from_pretrained( "Dannidee/VibeVoice7b-low-vram/4bit", device_map='cuda', torch_dtype=torch.bfloat16, ) processor = VibeVoiceProcessor.from_pretrained("Dannidee/VibeVoice7b-low-vram/4bit") # Generate speech text = "Speaker 1: Hello! Speaker 2: Hi there!" inputs = processor( text=[text], voice_samples=[["voice1.wav", "voice2.wav"]], padding=True, return_tensors="pt", ) outputs = model.generate(**inputs) processor.save_audio(outputs.speech_outputs[0], "output.wav") ```