vantagewithai
/

Bernini-R-GGUF-ComfyUI

Image-Text-to-Video

Model card Files Files and versions

Bernini-R-GGUF-ComfyUI / README.md

vantagewithai's picture

Update README.md

d0659e3 verified 19 days ago

|

History Blame Contribute Delete

2.97 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-video
	base_model: ByteDance/Bernini-R
	---

	Quantized GGUF version of Bernini-R for ComfyUI.

	Original model link: [https://huggingface.co/ByteDance/Bernini-R](https://huggingface.co/ByteDance/Bernini-R)

	Watch us on Youtube: [@VantageWithAI](https://www.youtube.com/@vantagewithai)

	<div align="center">

	<img src="https://huggingface.co/ByteDance/Bernini-R/resolve/main/assets/bernini-icon.png" width="560" alt="Bernini"/>

	<h4 align="center">Latent Semantic Planning for Video Diffusion</h4>

	*Chenchen Liu<sup>\</sup>, Junyi Chen<sup>\</sup>, Lei Li<sup>\</sup>, Lu Chi<sup>\,§</sup>, Mingzhen Sun<sup>\</sup>, Zhuoying Li<sup>\</sup>, Yi Fu, Ruoyu Guo, Yiheng Wu, Ge Bai, Zehuan Yuan<sup>✉</sup>*

	<sup>\*</sup> Equal contribution  <sup>✉</sup> Corresponding author  <sup>§</sup> Project lead

	[![arXiv](https://img.shields.io/badge/arXiv-2605.22344-b31b1b.svg)](https://arxiv.org/abs/2605.22344)
	[![Project Page](https://img.shields.io/badge/Project-Page-blue.svg)](https://bernini-ai.github.io/)
	[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-yellow)](https://huggingface.co/ByteDance/Bernini)

	</div>

	## 🎉 News

	- [2026-06-01] We open-sourced the inference code and model weights of the Bernini Renderer (Bernini-R).
	- [2026-05-22] We released our paper [Bernini: Latent Semantic Planning for Video Diffusion](https://arxiv.org/abs/2605.22344).

	## ✨ Highlights

	Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.

	On video editing, Bernini reaches the first tier among leading closed-source
	commercial models. The leaderboard below comes from our self-built arena
	platform, where human annotators blindly vote on paired edits and the votes are
	aggregated into a Bradley-Terry score and a pairwise win-rate matrix.

	<img src="https://huggingface.co/ByteDance/Bernini-R/resolve/main/assets/arena.png" width="900" alt="Video editing arena: Bradley-Terry leaderboard and pairwise win-rate matrix"/>

	## 📑 Citation

	If you use Bernini in your research, please cite:

	```bibtex
	@article{bernini,
	title = {Bernini: Latent Semantic Planning for Video Diffusion},
	author = {Chenchen Liu and Junyi Chen and Lei Li and Lu Chi and Mingzhen Sun and Zhuoying Li and Yi Fu and Ruoyu Guo and Yiheng Wu and Ge Bai and Zehuan Yuan},
	journal = {arXiv preprint arXiv:2605.22344},
	year = {2026}
	}
	```

	## 🙏 Acknowledgements

	Bernini builds on several outstanding open-source projects:

	- [Wan2.2-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B)
	- [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
	- [VeOmni](https://github.com/ByteDance-Seed/VeOmni)

	We thank the authors and communities of these projects for their contributions.

	## 📄 License

	Apache License 2.0. See [LICENSE](LICENSE).