---
license: mit
tags:
  - video-editing
  - video-generation
  - diffusion
  - wan
  - lora
  - agent
pipeline_tag: video-to-video
---

# Aurora — Model Weights

Pretrained weights for *"Aurora: Unified Video Editing with a Tool-Using Agent"*
([arXiv:2605.18748](https://arxiv.org/abs/2605.18748)).
Code: [github.com/yeates/Aurora](https://github.com/yeates/Aurora) ·
Project page: [yeates.github.io/Aurora-Page](https://yeates.github.io/Aurora-Page)

This repository bundles the two trained Aurora components, laid out to drop
straight into the code repository's `models/` directory:

```bash
huggingface-cli download yeates/aurora-weights --local-dir models
```

## Contents

| Path | Component | Notes |
|---|---|---|
| `aurora_editor.safetensors` | Video editor | trained `dit` + `mllm.context_projector` + `ref_vae_condition` (~9.4 GB, bf16) |
| `aurora_agent_vlm/` | Agent planner adapter | PEFT LoRA (`r=32`, `alpha=64`) on `Qwen/Qwen3-VL-8B-Instruct` |

### Editor — `aurora_editor.safetensors`

A **partial checkpoint** containing only the trained Aurora modules:

- `dit.*` — the WAN2.2-TI2V-5B diffusion transformer (fine-tuned)
- `mllm.context_projector.*` — projects frozen Qwen3.5-4B hidden states into DiT width
- `ref_vae_condition.*` — multi-reference conditioning with per-reference index embedding

It is loaded **on top of the frozen backbones** (WAN2.2-TI2V-5B + WAN2.2 VAE +
Qwen3.5-4B), not standalone. One checkpoint covers source-conditioned (s2v),
video-to-video (v2v), and reference-conditioned (sv2v) editing.

### Agent adapter — `aurora_agent_vlm/`

PEFT LoRA adapter for the tool-using planner: base `Qwen/Qwen3-VL-8B-Instruct`,
`r=32`, `lora_alpha=64`, on the attention + MLP projections. `adapter_config.json`
records `base_model_name_or_path = Qwen/Qwen3-VL-8B-Instruct`.

## Usage

After downloading into `models/` and installing the code repository:

```python
# Editor
from evaluation.pipeline_loader import load_v2_pipeline
pipe = load_v2_pipeline("models/aurora_editor.safetensors", device="cuda:0", ref_max_items=8)

# Agent planner (LoRA merged at load)
import aurora.agent
agent = aurora.agent.AgentVLM("models/Qwen3-VL-8B-Instruct", "models/aurora_agent_vlm", device="cuda:0")
```

You also need the frozen backbones under `models/` (WAN2.2-TI2V-5B,
`Wan2.2_VAE.pth`, Qwen3.5-4B, Qwen3-VL-8B-Instruct) — see the code repository's
Model Zoo. The full inference recipe (3-pass CFG defaults, per-benchmark
commands) is in the repository README.

## License

MIT (Aurora weights). The WAN2.2-TI2V-5B / WAN2.2 VAE / Qwen3.5-4B /
Qwen3-VL-8B-Instruct backbones carry their own respective licenses.

## Citation

```bibtex
@article{yu2026aurora,
  title={Aurora: Unified Video Editing with a Tool-Using Agent},
  author={Yu, Yongsheng and Zeng, Ziyun and Xiao, Zhiyuan and Zhou, Zhenghong and Hua, Hang and Xiong, Wei and Luo, Jiebo},
  journal={arXiv preprint arXiv:2605.18748},
  year={2026}
}
```