wavespeed
/

Ovi-e4m3_e4m3_dynamic_per_tensor

Model card Files Files and versions

Ovi-e4m3_e4m3_dynamic_per_tensor / README.md

chengzeyi's picture

Add files using upload-large-folder tool

cb57d9f verified 3 months ago

|

history blame contribute delete

1.3 kB

	# Ovi FusionModel - FP8 Quantized

	This is the Ovi FusionModel quantized with FP8 (e4m3_e4m3_dynamic_per_tensor) for faster inference.

	## Quantization Details

	- Video Model Blocks: 30 blocks quantized
	- Audio Model Blocks: 30 blocks quantized
	- Attention/FFN layers: e4m3_e4m3_dynamic_per_tensor
	- Other layers: e4m3_weightonly

	## Usage

	```python
	import sys
	import os
	import torch
	from omegaconf import OmegaConf
	from huggingface_hub import hf_hub_download

	OVI_PATH = "./workspace/Ovi"
	sys.path.insert(0, OVI_PATH)
	os.chdir(OVI_PATH)

	from ovi.ovi_fusion_engine import OviFusionEngine

	# Download quantized weights
	model_path = hf_hub_download(
	repo_id="wavespeed/Ovi-e4m3_e4m3_dynamic_per_tensor",
	filename="model.pth"
	)

	config = OmegaConf.load("config.yaml")
	engine = OviFusionEngine(config=config, device="cuda", target_dtype=torch.bfloat16)

	# Load quantized weights
	engine.model.load_state_dict(torch.load(model_path))

	# Model is already quantized, ready for inference
	```

	## Model Card

	- Developed by: Alibaba/Character.AI
	- Model type: Video + Audio generation (FusionModel)
	- Quantization: FP8 (e4m3_e4m3_dynamic_per_tensor)
	- License: Check original Ovi repository

	## Original Model

	Based on [Ovi](https://github.com/character-ai/Ovi)