YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

VibeVoice F16 Model

This model has been converted to float16 (f16) precision for reduced memory usage.

Conversion Details

  • Original model: microsoft/VibeVoice-1.5B
  • Mixed precision: True
  • Memory savings: ~45.7%
  • Original size: 10.07 GB
  • Converted size: 5.47 GB

Usage

from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor

# Load with f16 precision
model = VibeVoiceForConditionalGenerationInference.from_pretrained(
    "./VibeVoice-1.5B-f16",
    torch_dtype=torch.float16,
    device_map="cpu"  # or "cuda" for GPU
)

processor = VibeVoiceProcessor.from_pretrained("./VibeVoice-1.5B-f16")

# Use --use_f16 flag with demo scripts
python demo/inference_from_file.py --model_path ./VibeVoice-1.5B-f16 --use_f16 --device cpu

Notes

  • F16 precision may result in minor quality differences compared to f32
  • Some operations automatically upcast to f32 for numerical stability
  • Optimized for CPU inference, but also works on CUDA GPUs
Downloads last month
1
Safetensors
Model size
3B params
Tensor type
F32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support