|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
## NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale |
|
|
|
|
|
[Homepage](https://stepfun.ai/research/en/nextstep-1) | [GitHub](https://github.com/stepfun-ai/NextStep-1) | [Paper](https://github.com/stepfun-ai/NextStep-1/blob/main/nextstep_1_tech_report.pdf) |
|
|
|
|
|
We introduce **NextStep-1**, a 14B autoregressive model paired with a 157M flow matching head, training on discrete text tokens and continuous image tokens with next-token prediction objectives. |
|
|
**NextStep-1** achieves state-of-the-art performance for autoregressive models in text-to-image generation tasks, exhibiting strong capabilities in high-fidelity image synthesis. |
|
|
|
|
|
<div align='center'> |
|
|
<img src="assets/teaser.jpg" class="interpolation-image" alt="arch." width="100%" /> |
|
|
</div> |
|
|
|
|
|
## ENV Preparation |
|
|
|
|
|
To avoid potential errors when loading and running your models, we recommend using the following settings: |
|
|
|
|
|
```shell |
|
|
conda create -n nextstep python=3.11 -y |
|
|
conda activate nextstep |
|
|
|
|
|
pip install uv # optional |
|
|
|
|
|
# please check and download requirements.txt in this repo |
|
|
uv pip install -r requirements.txt |
|
|
|
|
|
# diffusers==0.34.0 |
|
|
# einops==0.8.1 |
|
|
# gradio==5.42.0 |
|
|
# loguru==0.7.3 |
|
|
# numpy==1.26.4 |
|
|
# omegaconf==2.3.0 |
|
|
# Pillow==11.0.0 |
|
|
# Requests==2.32.4 |
|
|
# safetensors==0.5.3 |
|
|
# tabulate==0.9.0 |
|
|
# torch==2.5.1 |
|
|
# torchvision==0.20.1 |
|
|
# tqdm==4.67.1 |
|
|
# transformers==4.55.0 |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from PIL import Image |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
from models.gen_pipeline import NextStepPipeline |
|
|
from utils.aspect_ratio import center_crop_arr_with_buckets |
|
|
|
|
|
HF_HUB = "stepfun-ai/NextStep-1-Large-Edit" |
|
|
|
|
|
# load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True,force_download=True) |
|
|
model = AutoModel.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True,force_download=True) |
|
|
pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device=f"cuda") |
|
|
|
|
|
# set prompts |
|
|
positive_prompt = None |
|
|
negative_prompt = "Copy original image." |
|
|
example_prompt = "<image>" + "Add a pirate hat to the dog's head. Change the background to a stormy sea with dark clouds. Include the text 'NextStep-Edit' in bold white letters at the top portion of the image." |
|
|
|
|
|
# load and preprocess reference image |
|
|
IMG_SIZE = 512 |
|
|
ref_image = Image.open("./assets/origin.jpg") |
|
|
ref_image = center_crop_arr_with_buckets(ref_image, buckets=[IMG_SIZE]) |
|
|
|
|
|
# generate edited image |
|
|
image = pipeline.generate_image( |
|
|
example_prompt, |
|
|
images=[ref_image], |
|
|
hw=(IMG_SIZE, IMG_SIZE), |
|
|
num_images_per_caption=1, |
|
|
positive_prompt=positive_prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
cfg=7.5, |
|
|
cfg_img=2, |
|
|
cfg_schedule="constant", |
|
|
use_norm=True, |
|
|
num_sampling_steps=50, |
|
|
timesteps_shift=3.2, |
|
|
seed=42, |
|
|
)[0] |
|
|
image.save(f"./assets/output.png") |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find NextStep useful for your research and applications, please consider starring this repository and citing: |
|
|
|
|
|
```bibtex |
|
|
@misc{nextstep_1, |
|
|
title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale}, |
|
|
author={NextStep Team}, |
|
|
year={2025}, |
|
|
url={https://github.com/stepfun-ai/NextStep-1}, |
|
|
} |
|
|
``` |
|
|
|