| --- |
| license: mit |
| tags: |
| - video-editing |
| - video-generation |
| - diffusion |
| - wan |
| - lora |
| - agent |
| pipeline_tag: video-to-video |
| --- |
| |
| # Aurora β Model Weights |
|
|
| Pretrained weights for *"Aurora: Unified Video Editing with a Tool-Using Agent"* |
| ([arXiv:2605.18748](https://arxiv.org/abs/2605.18748)). |
| Code: [github.com/yeates/Aurora](https://github.com/yeates/Aurora) Β· |
| Project page: [yeates.github.io/Aurora-Page](https://yeates.github.io/Aurora-Page) |
|
|
| This repository bundles the two trained Aurora components, laid out to drop |
| straight into the code repository's `models/` directory: |
|
|
| ```bash |
| huggingface-cli download yeates/aurora-weights --local-dir models |
| ``` |
|
|
| ## Contents |
|
|
| | Path | Component | Notes | |
| |---|---|---| |
| | `aurora_editor.safetensors` | Video editor | trained `dit` + `mllm.context_projector` + `ref_vae_condition` (~9.4 GB, bf16) | |
| | `aurora_agent_vlm/` | Agent planner adapter | PEFT LoRA (`r=32`, `alpha=64`) on `Qwen/Qwen3-VL-8B-Instruct` | |
|
|
| ### Editor β `aurora_editor.safetensors` |
| |
| A **partial checkpoint** containing only the trained Aurora modules: |
| |
| - `dit.*` β the WAN2.2-TI2V-5B diffusion transformer (fine-tuned) |
| - `mllm.context_projector.*` β projects frozen Qwen3.5-4B hidden states into DiT width |
| - `ref_vae_condition.*` β multi-reference conditioning with per-reference index embedding |
|
|
| It is loaded **on top of the frozen backbones** (WAN2.2-TI2V-5B + WAN2.2 VAE + |
| Qwen3.5-4B), not standalone. One checkpoint covers source-conditioned (s2v), |
| video-to-video (v2v), and reference-conditioned (sv2v) editing. |
|
|
| ### Agent adapter β `aurora_agent_vlm/` |
|
|
| PEFT LoRA adapter for the tool-using planner: base `Qwen/Qwen3-VL-8B-Instruct`, |
| `r=32`, `lora_alpha=64`, on the attention + MLP projections. `adapter_config.json` |
| records `base_model_name_or_path = Qwen/Qwen3-VL-8B-Instruct`. |
|
|
| ## Usage |
|
|
| After downloading into `models/` and installing the code repository: |
|
|
| ```python |
| # Editor |
| from evaluation.pipeline_loader import load_v2_pipeline |
| pipe = load_v2_pipeline("models/aurora_editor.safetensors", device="cuda:0", ref_max_items=8) |
| |
| # Agent planner (LoRA merged at load) |
| import aurora.agent |
| agent = aurora.agent.AgentVLM("models/Qwen3-VL-8B-Instruct", "models/aurora_agent_vlm", device="cuda:0") |
| ``` |
|
|
| You also need the frozen backbones under `models/` (WAN2.2-TI2V-5B, |
| `Wan2.2_VAE.pth`, Qwen3.5-4B, Qwen3-VL-8B-Instruct) β see the code repository's |
| Model Zoo. The full inference recipe (3-pass CFG defaults, per-benchmark |
| commands) is in the repository README. |
|
|
| ## License |
|
|
| MIT (Aurora weights). The WAN2.2-TI2V-5B / WAN2.2 VAE / Qwen3.5-4B / |
| Qwen3-VL-8B-Instruct backbones carry their own respective licenses. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{yu2026aurora, |
| title={Aurora: Unified Video Editing with a Tool-Using Agent}, |
| author={Yu, Yongsheng and Zeng, Ziyun and Xiao, Zhiyuan and Zhou, Zhenghong and Hua, Hang and Xiong, Wei and Luo, Jiebo}, |
| journal={arXiv preprint arXiv:2605.18748}, |
| year={2026} |
| } |
| ``` |
|
|