|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-to-video |
|
|
--- |
|
|
|
|
|
# CineScale: Free Lunch in High-Resolution Cinematic Visual Generation |
|
|
|
|
|
This repository contains the CineScale models presented in the paper [CineScale: Free Lunch in High-Resolution Cinematic Visual Generation](https://huggingface.co/papers/2508.15774). |
|
|
|
|
|
CineScale proposes a novel inference paradigm to enable higher-resolution visual generation. It broadens the scope by enabling high-resolution I2V (Image-to-Video) and V2V (Video-to-Video) synthesis, built atop state-of-the-art open-source video generation frameworks, significantly improving upon existing methods which are prone to repetitive patterns in high-resolution outputs. |
|
|
|
|
|
**Project Page:** [https://eyeline-labs.github.io/CineScale/](https://eyeline-labs.github.io/CineScale/) |
|
|
**Code & Detailed Usage:** [https://github.com/Eyeline-Labs/CineScale](https://github.com/Eyeline-Labs/CineScale) |
|
|
|
|
|
## Models |
|
|
CineScale provides a family of models, including Text-to-Video (T2V) and Image-to-Video (I2V) variants, capable of generating videos up to 4K resolution. |
|
|
|
|
|
| Model | Tuning Resolution | Checkpoint | Description | |
|
|
| :-------------------------- | :---------------- | :------------------------------------------------------------------------------- | :-------------------------------------------- | |
|
|
| CineScale-1.3B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_1.3b_ntk20.ckpt) | Supports 3K (1632x2880) inference on A100 x 1 | |
|
|
| CineScale-14B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 | |
|
|
| CineScale-14B-I2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/i2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 | |
|
|
|
|
|
## Quick Start |
|
|
To get started, you will need to set up the environment and download the model checkpoints as described in the [GitHub repository](https://github.com/Eyeline-Labs/CineScale). |
|
|
|
|
|
Inference examples for various resolutions and tasks are provided in the GitHub repository's command-line scripts. For instance, to run 2K-resolution text-to-video inference: |
|
|
```bash |
|
|
# Example for 2K-Resolution Text-to-Video (Base Model Wan2.1-1.3B) |
|
|
# Single GPU |
|
|
CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py |
|
|
# Multiple GPUs |
|
|
torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py |
|
|
``` |
|
|
Refer to the [GitHub repository](https://github.com/Eyeline-Labs/CineScale) for more detailed instructions and examples for 3K and 4K video generation. |
|
|
|
|
|
## Citation |
|
|
If you find our work useful, please consider citing our paper: |
|
|
```bib |
|
|
@article{qiu2025cinescale, |
|
|
title={CineScale: Free Lunch in High-Resolution Cinematic Visual Generation}, |
|
|
author={Haonan Qiu and Ning Yu and Ziqi Huang and Paul Debevec and Ziwei Liu}, |
|
|
journal={arXiv preprint arXiv:2508.15774}, |
|
|
year={2025} |
|
|
} |
|
|
``` |