Enhance model card: Add pipeline tag, paper, project, code links, and usage
Browse filesThis PR significantly enhances the model card for CineScale, improving its discoverability and providing comprehensive documentation for users.
Key additions include:
- **Pipeline Tag**: Added `pipeline_tag: text-to-video` to ensure the model appears in the relevant category on the Hub, aligning with its capabilities in high-resolution video generation.
- **Paper Link**: Linked to the official paper, [CineScale: Free Lunch in High-Resolution Cinematic Visual Generation](https://huggingface.co/papers/2508.15774).
- **External Links**: Included clear links to the project page ([https://eyeline-labs.github.io/CineScale/](https://eyeline-labs.github.io/CineScale/)) and the GitHub repository ([https://github.com/Eyeline-Labs/CineScale](https://github.com/Eyeline-Labs/CineScale)) for easy access to code and further details.
- **Model Overview**: Added a table detailing the available CineScale models, their tuning resolutions, and capabilities.
- **Sample Usage**: Provided a "Quick Start" section with a code snippet directly from the GitHub README, guiding users on how to initiate video generation, with a clear reference to the GitHub repository for more advanced examples and setup.
- **Citation**: Included the BibTeX citation for proper academic attribution.
These changes make the model card more informative and user-friendly.
|
@@ -1,3 +1,46 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
|
| 7 |
+
|
| 8 |
+
This repository contains the CineScale models presented in the paper [CineScale: Free Lunch in High-Resolution Cinematic Visual Generation](https://huggingface.co/papers/2508.15774).
|
| 9 |
+
|
| 10 |
+
CineScale proposes a novel inference paradigm to enable higher-resolution visual generation. It broadens the scope by enabling high-resolution I2V (Image-to-Video) and V2V (Video-to-Video) synthesis, built atop state-of-the-art open-source video generation frameworks, significantly improving upon existing methods which are prone to repetitive patterns in high-resolution outputs.
|
| 11 |
+
|
| 12 |
+
**Project Page:** [https://eyeline-labs.github.io/CineScale/](https://eyeline-labs.github.io/CineScale/)
|
| 13 |
+
**Code & Detailed Usage:** [https://github.com/Eyeline-Labs/CineScale](https://github.com/Eyeline-Labs/CineScale)
|
| 14 |
+
|
| 15 |
+
## Models
|
| 16 |
+
CineScale provides a family of models, including Text-to-Video (T2V) and Image-to-Video (I2V) variants, capable of generating videos up to 4K resolution.
|
| 17 |
+
|
| 18 |
+
| Model | Tuning Resolution | Checkpoint | Description |
|
| 19 |
+
| :-------------------------- | :---------------- | :------------------------------------------------------------------------------- | :-------------------------------------------- |
|
| 20 |
+
| CineScale-1.3B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_1.3b_ntk20.ckpt) | Supports 3K (1632x2880) inference on A100 x 1 |
|
| 21 |
+
| CineScale-14B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 |
|
| 22 |
+
| CineScale-14B-I2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/i2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 |
|
| 23 |
+
|
| 24 |
+
## Quick Start
|
| 25 |
+
To get started, you will need to set up the environment and download the model checkpoints as described in the [GitHub repository](https://github.com/Eyeline-Labs/CineScale).
|
| 26 |
+
|
| 27 |
+
Inference examples for various resolutions and tasks are provided in the GitHub repository's command-line scripts. For instance, to run 2K-resolution text-to-video inference:
|
| 28 |
+
```bash
|
| 29 |
+
# Example for 2K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)
|
| 30 |
+
# Single GPU
|
| 31 |
+
CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py
|
| 32 |
+
# Multiple GPUs
|
| 33 |
+
torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py
|
| 34 |
+
```
|
| 35 |
+
Refer to the [GitHub repository](https://github.com/Eyeline-Labs/CineScale) for more detailed instructions and examples for 3K and 4K video generation.
|
| 36 |
+
|
| 37 |
+
## Citation
|
| 38 |
+
If you find our work useful, please consider citing our paper:
|
| 39 |
+
```bib
|
| 40 |
+
@article{qiu2025cinescale,
|
| 41 |
+
title={CineScale: Free Lunch in High-Resolution Cinematic Visual Generation},
|
| 42 |
+
author={Haonan Qiu and Ning Yu and Ziqi Huang and Paul Debevec and Ziwei Liu},
|
| 43 |
+
journal={arXiv preprint arXiv:2508.15774},
|
| 44 |
+
year={2025}
|
| 45 |
+
}
|
| 46 |
+
```
|