| | --- |
| | license: mit |
| | base_model: |
| | - Wan-AI/Wan2.1-T2V-1.3B |
| | pipeline_tag: text-to-video |
| | --- |
| | <p align="center"> |
| | <h1 align="center">HiAR</h1> |
| | <h3 align="center">Hierarchical Autoregressive Video Generation with Pipelined Parallel Inference</h3> |
| | </p> |
| | <p align="center"> |
| | <h3 align="center"><a href="https://arxiv.org/abs/2603.08703">arXiv</a> | <a href="https://jacky-hate.github.io/HiAR/">Website</a> | <a href="https://github.com/Jacky-hate/HiAR">Code</a> | <a href="https://huggingface.co/jackyhate/HiAR/tree/main">Model</a></h3> |
| | </p> |
| |
|
| | --- |
| |
|
| | HiAR proposes **hierarchical denoising** for autoregressive video diffusion models, a paradigm shift from conventional block-first to **step-first** denoising order. By conditioning each block on context at a matched noise level, HiAR maximally attenuates error propagation while preserving temporal causality, achieving **state-of-the-art long video generation** (20s+) with significantly reduced quality drift. |
| |
|