neuregex
/

Bernini-R-fp8

bernini_renderer

reference-to-video

Model card Files Files and versions

Bernini-R-fp8 / README.md

neuregex's picture

Add files using upload-large-folder tool

28e3b22 verified 3 days ago

|

history blame contribute delete

2.74 kB

	---
	license: apache-2.0
	base_model: Wan-AI/Wan2.2-T2V-A14B
	pipeline_tag: text-to-video
	library_name: diffusers
	tags:
	- comfyui
	- bernini-r
	- wan2.2
	- text-to-video
	- video-editing
	- reference-to-video
	- fp8
	---

	# Bernini-R fp8 (e4m3) — for ComfyUI-BerniniR

	fp8 (`float8_e4m3fn`, weight-only) build of [ByteDance/Bernini-R](https://huggingface.co/ByteDance/Bernini-R)
	(which is Wan2.2-T2V-A14B inside), self-contained (2 transformers + VAE + UMT5 + tokenizer +
	scheduler), packaged for the [ComfyUI-BerniniR](https://github.com/neuregex/ComfyUI-BerniniR)
	custom node. Runs the full pipeline in 24 GB.

	The Linear weights are stored in `float8_e4m3fn` and upcast to bf16 on every forward;
	norms / embeddings / time-/text-embedders / patch-embed stay in bf16/fp32. These weights are
	bit-identical to the node's on-the-fly fp8 quantization — validated end-to-end on an i2i edit:
	same seed, same GPU → 0 pixel difference. Loading this pre-quantized bundle is also faster
	(no bf16 → fp8 cast at load).

	## VRAM (measured: `torch.cuda.max_memory_allocated`, NVIDIA A10 24 GB, fp8 + sequential offload)

	\| Task \| Frames / resolution \| Peak VRAM \| Fits 24 GB \|
	\|---\|---\|---\|---\|
	\| i2i / t2i (image edit / image) \| 1 frame, 848×848 \| ~16.7 GB \| ✅ \|
	\| t2v / v2v / rv2v (video / video edit) \| 81 frames (full length), 480p \| ~18.8 GB \| ✅ \|

	The UMT5 text encoder is freed before the experts load, and offload keeps a single ~14 GB expert
	resident — that is what makes full-length 480p video fit in 24 GB.

	## Tasks

	`t2v` · `t2i` · `i2i` (image edit) · `v2v` (video edit) · `rv2v` (video edit + reference) ·
	`r2v` (reference-to-video). Edits preserve the source content/motion via Bernini's source-id RoPE
	(validated qualitatively).

	## Usage

	Install the [ComfyUI-BerniniR](https://github.com/neuregex/ComfyUI-BerniniR) node, then in
	`BerniniR · Load Model`:

	- set source = `neuregex/Bernini-R-fp8 (auto)` with `auto_download = True` — downloads ~40 GB to
	`download_dir` on first run (with a free-space check and progress bar), or
	- `hf download neuregex/Bernini-R-fp8 --local-dir models/bernini/Bernini-R-fp8` and use `source = local`.

	For the full bf16 weights instead, point the node at `ByteDance/Bernini-R-Diffusers`
	(`source = ... (full bf16)`), which needs more VRAM (A100-class) or on-the-fly fp8.

	## Credits & license

	- Algorithm & model: Bernini: Latent Semantic Planning for Video Diffusion, ByteDance
	([arXiv:2605.22344](https://arxiv.org/abs/2605.22344) · [code](https://github.com/bytedance/Bernini)) — Apache-2.0.
	- Base: [Wan2.2-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B).
	- fp8 build by neuregex. Apache-2.0.