metadata
license: apache-2.0
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
We introduce NextStep-1, a 14B autoregressive model paired with a 157M flow matching head, training on discrete text tokens and continuous image tokens with next-token prediction objectives. NextStep-1 achieves state-of-the-art performance for autoregressive models in text-to-image generation tasks, exhibiting strong capabilities in high-fidelity image synthesis.
ENV Preparation
To avoid potential errors when loading and running your models, we recommend using the following settings:
conda create -n nextstep python=3.11 -y
conda activate nextstep
pip install uv # optional
# please check and download requirements.txt in this repo
uv pip install -r requirements.txt
# diffusers==0.34.0
# einops==0.8.1
# gradio==5.42.0
# loguru==0.7.3
# numpy==1.26.4
# omegaconf==2.3.0
# Pillow==11.0.0
# Requests==2.32.4
# safetensors==0.5.3
# tabulate==0.9.0
# torch==2.5.1
# torchvision==0.20.1
# tqdm==4.67.1
# transformers==4.55.0
Usage
import torch
from transformers import AutoTokenizer, AutoModel
from models.gen_pipeline import NextStepPipeline
HF_HUB = "stepfun-ai/NextStep-1-Large"
# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True)
model = AutoModel.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True)
pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device="cuda", dtype=torch.bfloat16)
# set prompts
positive_prompt = "masterpiece, film grained, best quality."
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry."
example_prompt = "A realistic photograph of a wall with \"NextStep-1.1 is coming\" prominently displayed"
# generate image from text
IMG_SIZE = 512
image = pipeline.generate_image(
example_prompt,
hw=(IMG_SIZE, IMG_SIZE),
num_images_per_caption=1,
positive_prompt=positive_prompt,
negative_prompt=negative_prompt,
cfg=7.5,
cfg_img=1.0,
cfg_schedule="constant",
use_norm=False,
num_sampling_steps=28,
timesteps_shift=1.0,
seed=3407,
)[0]
image.save("./assets/output.jpg")
Citation
If you find NextStep useful for your research and applications, please consider starring this repository and citing:
@misc{nextstep_1,
title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale},
author={NextStep Team},
year={2025},
url={https://github.com/stepfun-ai/NextStep-1},
}