Instructions to use saeed-5959/high_sync with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use saeed-5959/high_sync with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("saeed-5959/high_sync", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
File size: 2,710 Bytes
4bd69ba 625c8e9 4bd69ba | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ---
pipeline_tag: image-to-video
library_name: diffusers
---
<h1 align='center'>HighSync: High-Quality Lip Synchronization via Latent Diffusion Models</h1>
HighSync is an end-to-end diffusion-based framework for high-fidelity lip synchronization that generates photorealistic talking-face videos aligned with arbitrary input audio. It is the first lip sync model to operate natively at 512x512 resolution, positioning it as a viable solution for professional production environments.
- **Paper:** [HighSync: High-Quality Lip Synchronization via Latent Diffusion Models](https://huggingface.co/papers/2605.16918)
- **GitHub:** [saeed5959/high_sync](https://github.com/saeed5959/high_sync)
## Abstract
We present HighSync, an end-to-end diffusion-based framework for high-fidelity lip synchronization that generates photorealistic talking-face videos aligned with arbitrary input audio. Existing approaches consistently struggle to reconcile image quality with synchronization accuracy, producing either visually degraded outputs or temporally inconsistent lip movements. HighSync addresses both challenges simultaneously and, to our knowledge, is the first lip sync model to operate natively at 512x512 resolution. Central to our approach is the identification and systematic elimination of a data leakage phenomenon that has silently undermined temporal modeling in prior work, preventing models from developing a genuine dependence on the audio signal.
## ⚒️ Installation
### Environment
Ubuntu 20 or 22
### Setup
```bash
git clone https://github.com/saeed5959/high_sync
cd high_sync
pip install -r requirements.txt
apt-get install ffmpeg
```
### Download Pretrained Weights
```bash
git lfs install
git clone https://huggingface.co/saeed-5959/high_sync pretrained_weights
```
## 🚀 Usage
First, convert your source video to 25 FPS:
```bash
ffmpeg -i input.mp4 -r 25 out_25.mp4
```
Then run the inference script:
```bash
python -m inference --source_video "video_path.mp4" --driving_audio "audio_path.wav" --output "save_path.mp4"
```
## Citation
```bibtex
@article{daghigh2024highsync,
title={HighSync: High-Quality Lip Synchronization via Latent Diffusion Models},
author={Saeed Firouzi Daghigh and Majid Iranpour Mobarekeh and Mostafa Alavi and Mehdi Bagheri},
journal={arXiv preprint arXiv:2605.16918},
year={2026}
}
```
## 🙏 Acknowledgements
This work is mainly based on [EchoMimic](https://github.com/antgroup/echomimic). We would also like to thank the contributors to the [AnimateDiff](https://github.com/guoyww/AnimateDiff), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), and [MuseTalk](https://github.com/TMElyralab/MuseTalk) repositories. |