MeiGen-MultiTalk / README.md

Add custom handler for MeiGen-MultiTalk Inference Endpoint

ab4557b 4 months ago

1.9 kB

	---
	license: apache-2.0
	tags:
	- text-to-video
	- image-to-video
	- custom
	- inference-endpoints
	library_name: diffusers
	---

	# MeiGen-MultiTalk Endpoint Handler

	This repository contains a custom handler for deploying MeiGen-AI's MultiTalk model on Hugging Face Inference Endpoints.

	## Model Description

	MeiGen-MultiTalk is an advanced model for generating audio-driven multi-person conversational videos. This handler wraps the original model to work with HF Inference Endpoints.

	## Features

	- Text-to-video generation
	- Image-to-video generation
	- Multi-person conversation synthesis
	- Support for various resolutions (480p, 720p)
	- Optimized for A100 GPUs

	## Usage with Inference Endpoints

	### Recommended Configuration

	- Hardware: GPU · A100 · 1x GPU (80 GB)
	- Autoscaling:
	- Min replicas: 0
	- Max replicas: 1
	- Scale to zero after: 300 seconds

	### API Example

	```python
	import requests
	import json
	import base64

	API_URL = "https://YOUR-ENDPOINT-URL.endpoints.huggingface.cloud"
	headers = {
	"Authorization": "Bearer YOUR_HF_TOKEN",
	"Content-Type": "application/json"
	}

	# Text-to-video generation
	data = {
	"inputs": {
	"prompt": "A person giving a presentation"
	},
	"parameters": {
	"num_frames": 16,
	"height": 480,
	"width": 640,
	"num_inference_steps": 25,
	"guidance_scale": 7.5
	}
	}

	response = requests.post(API_URL, headers=headers, json=data)
	result = response.json()
	```

	## Technical Details

	The handler includes:
	- Automatic model loading from MeiGen-AI/MeiGen-MultiTalk
	- Memory optimization for GPU inference
	- Support for both diffusion pipeline and transformer modes
	- Error handling and logging
	- Base64 encoding for image/video I/O

	## License

	Apache 2.0 License

	## Credits

	Based on the original [MeiGen-AI/MeiGen-MultiTalk](https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk) model.