# Ovi FusionModel - FP8 Quantized This is the Ovi FusionModel quantized with FP8 (e4m3_e4m3_dynamic_per_tensor) for faster inference. ## Quantization Details - **Video Model Blocks**: 30 blocks quantized - **Audio Model Blocks**: 30 blocks quantized - **Attention/FFN layers**: e4m3_e4m3_dynamic_per_tensor - **Other layers**: e4m3_weightonly ## Usage ```python import sys import os import torch from omegaconf import OmegaConf from huggingface_hub import hf_hub_download OVI_PATH = "./workspace/Ovi" sys.path.insert(0, OVI_PATH) os.chdir(OVI_PATH) from ovi.ovi_fusion_engine import OviFusionEngine # Download quantized weights model_path = hf_hub_download( repo_id="wavespeed/Ovi-e4m3_e4m3_dynamic_per_tensor", filename="model.pth" ) config = OmegaConf.load("config.yaml") engine = OviFusionEngine(config=config, device="cuda", target_dtype=torch.bfloat16) # Load quantized weights engine.model.load_state_dict(torch.load(model_path)) # Model is already quantized, ready for inference ``` ## Model Card - **Developed by**: Alibaba/Character.AI - **Model type**: Video + Audio generation (FusionModel) - **Quantization**: FP8 (e4m3_e4m3_dynamic_per_tensor) - **License**: Check original Ovi repository ## Original Model Based on [Ovi](https://github.com/character-ai/Ovi)