--- license: mit tags: - video-editing - video-generation - diffusion - wan - lora - agent pipeline_tag: video-to-video --- # Aurora — Model Weights Pretrained weights for *"Aurora: Unified Video Editing with a Tool-Using Agent"* ([arXiv:2605.18748](https://arxiv.org/abs/2605.18748)). Code: [github.com/yeates/Aurora](https://github.com/yeates/Aurora) · Project page: [yeates.github.io/Aurora-Page](https://yeates.github.io/Aurora-Page) This repository bundles the two trained Aurora components, laid out to drop straight into the code repository's `models/` directory: ```bash huggingface-cli download yeates/aurora-weights --local-dir models ``` ## Contents | Path | Component | Notes | |---|---|---| | `aurora_editor.safetensors` | Video editor | trained `dit` + `mllm.context_projector` + `ref_vae_condition` (~9.4 GB, bf16) | | `aurora_agent_vlm/` | Agent planner adapter | PEFT LoRA (`r=32`, `alpha=64`) on `Qwen/Qwen3-VL-8B-Instruct` | ### Editor — `aurora_editor.safetensors` A **partial checkpoint** containing only the trained Aurora modules: - `dit.*` — the WAN2.2-TI2V-5B diffusion transformer (fine-tuned) - `mllm.context_projector.*` — projects frozen Qwen3.5-4B hidden states into DiT width - `ref_vae_condition.*` — multi-reference conditioning with per-reference index embedding It is loaded **on top of the frozen backbones** (WAN2.2-TI2V-5B + WAN2.2 VAE + Qwen3.5-4B), not standalone. One checkpoint covers source-conditioned (s2v), video-to-video (v2v), and reference-conditioned (sv2v) editing. ### Agent adapter — `aurora_agent_vlm/` PEFT LoRA adapter for the tool-using planner: base `Qwen/Qwen3-VL-8B-Instruct`, `r=32`, `lora_alpha=64`, on the attention + MLP projections. `adapter_config.json` records `base_model_name_or_path = Qwen/Qwen3-VL-8B-Instruct`. ## Usage After downloading into `models/` and installing the code repository: ```python # Editor from evaluation.pipeline_loader import load_v2_pipeline pipe = load_v2_pipeline("models/aurora_editor.safetensors", device="cuda:0", ref_max_items=8) # Agent planner (LoRA merged at load) import aurora.agent agent = aurora.agent.AgentVLM("models/Qwen3-VL-8B-Instruct", "models/aurora_agent_vlm", device="cuda:0") ``` You also need the frozen backbones under `models/` (WAN2.2-TI2V-5B, `Wan2.2_VAE.pth`, Qwen3.5-4B, Qwen3-VL-8B-Instruct) — see the code repository's Model Zoo. The full inference recipe (3-pass CFG defaults, per-benchmark commands) is in the repository README. ## License MIT (Aurora weights). The WAN2.2-TI2V-5B / WAN2.2 VAE / Qwen3.5-4B / Qwen3-VL-8B-Instruct backbones carry their own respective licenses. ## Citation ```bibtex @article{yu2026aurora, title={Aurora: Unified Video Editing with a Tool-Using Agent}, author={Yu, Yongsheng and Zeng, Ziyun and Xiao, Zhiyuan and Zhou, Zhenghong and Hua, Hang and Xiong, Wei and Luo, Jiebo}, journal={arXiv preprint arXiv:2605.18748}, year={2026} } ```