aurora-weights / README.md
yeates's picture
Add Aurora editor checkpoint + agent LoRA adapter
bb83c94 verified
|
Raw
History Blame Contribute Delete
2.95 kB
---
license: mit
tags:
- video-editing
- video-generation
- diffusion
- wan
- lora
- agent
pipeline_tag: video-to-video
---
# Aurora β€” Model Weights
Pretrained weights for *"Aurora: Unified Video Editing with a Tool-Using Agent"*
([arXiv:2605.18748](https://arxiv.org/abs/2605.18748)).
Code: [github.com/yeates/Aurora](https://github.com/yeates/Aurora) Β·
Project page: [yeates.github.io/Aurora-Page](https://yeates.github.io/Aurora-Page)
This repository bundles the two trained Aurora components, laid out to drop
straight into the code repository's `models/` directory:
```bash
huggingface-cli download yeates/aurora-weights --local-dir models
```
## Contents
| Path | Component | Notes |
|---|---|---|
| `aurora_editor.safetensors` | Video editor | trained `dit` + `mllm.context_projector` + `ref_vae_condition` (~9.4 GB, bf16) |
| `aurora_agent_vlm/` | Agent planner adapter | PEFT LoRA (`r=32`, `alpha=64`) on `Qwen/Qwen3-VL-8B-Instruct` |
### Editor β€” `aurora_editor.safetensors`
A **partial checkpoint** containing only the trained Aurora modules:
- `dit.*` β€” the WAN2.2-TI2V-5B diffusion transformer (fine-tuned)
- `mllm.context_projector.*` β€” projects frozen Qwen3.5-4B hidden states into DiT width
- `ref_vae_condition.*` β€” multi-reference conditioning with per-reference index embedding
It is loaded **on top of the frozen backbones** (WAN2.2-TI2V-5B + WAN2.2 VAE +
Qwen3.5-4B), not standalone. One checkpoint covers source-conditioned (s2v),
video-to-video (v2v), and reference-conditioned (sv2v) editing.
### Agent adapter β€” `aurora_agent_vlm/`
PEFT LoRA adapter for the tool-using planner: base `Qwen/Qwen3-VL-8B-Instruct`,
`r=32`, `lora_alpha=64`, on the attention + MLP projections. `adapter_config.json`
records `base_model_name_or_path = Qwen/Qwen3-VL-8B-Instruct`.
## Usage
After downloading into `models/` and installing the code repository:
```python
# Editor
from evaluation.pipeline_loader import load_v2_pipeline
pipe = load_v2_pipeline("models/aurora_editor.safetensors", device="cuda:0", ref_max_items=8)
# Agent planner (LoRA merged at load)
import aurora.agent
agent = aurora.agent.AgentVLM("models/Qwen3-VL-8B-Instruct", "models/aurora_agent_vlm", device="cuda:0")
```
You also need the frozen backbones under `models/` (WAN2.2-TI2V-5B,
`Wan2.2_VAE.pth`, Qwen3.5-4B, Qwen3-VL-8B-Instruct) β€” see the code repository's
Model Zoo. The full inference recipe (3-pass CFG defaults, per-benchmark
commands) is in the repository README.
## License
MIT (Aurora weights). The WAN2.2-TI2V-5B / WAN2.2 VAE / Qwen3.5-4B /
Qwen3-VL-8B-Instruct backbones carry their own respective licenses.
## Citation
```bibtex
@article{yu2026aurora,
title={Aurora: Unified Video Editing with a Tool-Using Agent},
author={Yu, Yongsheng and Zeng, Ziyun and Xiao, Zhiyuan and Zhou, Zhenghong and Hua, Hang and Xiong, Wei and Luo, Jiebo},
journal={arXiv preprint arXiv:2605.18748},
year={2026}
}
```