yeates
/

aurora-weights

video-generation

Model card Files Files and versions

aurora-weights / README.md

yeates's picture

Add Aurora editor checkpoint + agent LoRA adapter

bb83c94 verified 18 days ago

|

History Blame Contribute Delete

2.95 kB

	---
	license: mit
	tags:
	- video-editing
	- video-generation
	- diffusion
	- wan
	- lora
	- agent
	pipeline_tag: video-to-video
	---

	# Aurora — Model Weights

	Pretrained weights for "Aurora: Unified Video Editing with a Tool-Using Agent"
	([arXiv:2605.18748](https://arxiv.org/abs/2605.18748)).
	Code: [github.com/yeates/Aurora](https://github.com/yeates/Aurora) ·
	Project page: [yeates.github.io/Aurora-Page](https://yeates.github.io/Aurora-Page)

	This repository bundles the two trained Aurora components, laid out to drop
	straight into the code repository's `models/` directory:

	```bash
	huggingface-cli download yeates/aurora-weights --local-dir models
	```

	## Contents

	\| Path \| Component \| Notes \|
	\|---\|---\|---\|
	\| `aurora_editor.safetensors` \| Video editor \| trained `dit` + `mllm.context_projector` + `ref_vae_condition` (~9.4 GB, bf16) \|
	\| `aurora_agent_vlm/` \| Agent planner adapter \| PEFT LoRA (`r=32`, `alpha=64`) on `Qwen/Qwen3-VL-8B-Instruct` \|

	### Editor — `aurora_editor.safetensors`

	A partial checkpoint containing only the trained Aurora modules:

	- `dit.*` — the WAN2.2-TI2V-5B diffusion transformer (fine-tuned)
	- `mllm.context_projector.*` — projects frozen Qwen3.5-4B hidden states into DiT width
	- `ref_vae_condition.*` — multi-reference conditioning with per-reference index embedding

	It is loaded on top of the frozen backbones (WAN2.2-TI2V-5B + WAN2.2 VAE +
	Qwen3.5-4B), not standalone. One checkpoint covers source-conditioned (s2v),
	video-to-video (v2v), and reference-conditioned (sv2v) editing.

	### Agent adapter — `aurora_agent_vlm/`

	PEFT LoRA adapter for the tool-using planner: base `Qwen/Qwen3-VL-8B-Instruct`,
	`r=32`, `lora_alpha=64`, on the attention + MLP projections. `adapter_config.json`
	records `base_model_name_or_path = Qwen/Qwen3-VL-8B-Instruct`.

	## Usage

	After downloading into `models/` and installing the code repository:

	```python
	# Editor
	from evaluation.pipeline_loader import load_v2_pipeline
	pipe = load_v2_pipeline("models/aurora_editor.safetensors", device="cuda:0", ref_max_items=8)

	# Agent planner (LoRA merged at load)
	import aurora.agent
	agent = aurora.agent.AgentVLM("models/Qwen3-VL-8B-Instruct", "models/aurora_agent_vlm", device="cuda:0")
	```

	You also need the frozen backbones under `models/` (WAN2.2-TI2V-5B,
	`Wan2.2_VAE.pth`, Qwen3.5-4B, Qwen3-VL-8B-Instruct) — see the code repository's
	Model Zoo. The full inference recipe (3-pass CFG defaults, per-benchmark
	commands) is in the repository README.

	## License

	MIT (Aurora weights). The WAN2.2-TI2V-5B / WAN2.2 VAE / Qwen3.5-4B /
	Qwen3-VL-8B-Instruct backbones carry their own respective licenses.

	## Citation

	```bibtex
	@article{yu2026aurora,
	title={Aurora: Unified Video Editing with a Tool-Using Agent},
	author={Yu, Yongsheng and Zeng, Ziyun and Xiao, Zhiyuan and Zhou, Zhenghong and Hua, Hang and Xiong, Wei and Luo, Jiebo},
	journal={arXiv preprint arXiv:2605.18748},
	year={2026}
	}
	```