# Ovi FusionModel - FP8 Quantized

This is the Ovi FusionModel quantized with FP8 (e4m3_e4m3_dynamic_per_tensor) for faster inference.

## Quantization Details

- **Video Model Blocks**: 30 blocks quantized
- **Audio Model Blocks**: 30 blocks quantized
- **Attention/FFN layers**: e4m3_e4m3_dynamic_per_tensor
- **Other layers**: e4m3_weightonly

## Usage

```python
import sys
import os
import torch
from omegaconf import OmegaConf
from huggingface_hub import hf_hub_download

OVI_PATH = "./workspace/Ovi"
sys.path.insert(0, OVI_PATH)
os.chdir(OVI_PATH)

from ovi.ovi_fusion_engine import OviFusionEngine

# Download quantized weights
model_path = hf_hub_download(
    repo_id="wavespeed/Ovi-e4m3_e4m3_dynamic_per_tensor",
    filename="model.pth"
)

config = OmegaConf.load("config.yaml")
engine = OviFusionEngine(config=config, device="cuda", target_dtype=torch.bfloat16)

# Load quantized weights
engine.model.load_state_dict(torch.load(model_path))

# Model is already quantized, ready for inference
```

## Model Card

- **Developed by**: Alibaba/Character.AI
- **Model type**: Video + Audio generation (FusionModel)
- **Quantization**: FP8 (e4m3_e4m3_dynamic_per_tensor)
- **License**: Check original Ovi repository

## Original Model

Based on [Ovi](https://github.com/character-ai/Ovi)