--- license: apache-2.0 --- ## NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale [Homepage](https://stepfun.ai/research/en/nextstep-1)  | [GitHub](https://github.com/stepfun-ai/NextStep-1)  | [Paper](https://github.com/stepfun-ai/NextStep-1/blob/main/nextstep_1_tech_report.pdf)  We introduce **NextStep-1**, a 14B autoregressive model paired with a 157M flow matching head, training on discrete text tokens and continuous image tokens with next-token prediction objectives. **NextStep-1** achieves state-of-the-art performance for autoregressive models in text-to-image generation tasks, exhibiting strong capabilities in high-fidelity image synthesis.
arch.
## ENV Preparation To avoid potential errors when loading and running your models, we recommend using the following settings: ```shell conda create -n nextstep python=3.11 -y conda activate nextstep pip install uv # optional # please check and download requirements.txt in this repo uv pip install -r requirements.txt # diffusers==0.34.0 # einops==0.8.1 # gradio==5.42.0 # loguru==0.7.3 # numpy==1.26.4 # omegaconf==2.3.0 # Pillow==11.0.0 # Requests==2.32.4 # safetensors==0.5.3 # tabulate==0.9.0 # torch==2.5.1 # torchvision==0.20.1 # tqdm==4.67.1 # transformers==4.55.0 ``` ## Usage ```python import torch from transformers import AutoTokenizer, AutoModel from models.gen_pipeline import NextStepPipeline HF_HUB = "stepfun-ai/NextStep-1-Large" # load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True) model = AutoModel.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True) pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device="cuda", dtype=torch.bfloat16) # set prompts positive_prompt = "masterpiece, film grained, best quality." negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry." example_prompt = "A realistic photograph of a wall with \"NextStep-1.1 is coming\" prominently displayed" # generate image from text IMG_SIZE = 512 image = pipeline.generate_image( example_prompt, hw=(IMG_SIZE, IMG_SIZE), num_images_per_caption=1, positive_prompt=positive_prompt, negative_prompt=negative_prompt, cfg=7.5, cfg_img=1.0, cfg_schedule="constant", use_norm=False, num_sampling_steps=28, timesteps_shift=1.0, seed=3407, )[0] image.save("./assets/output.jpg") ``` ## Citation If you find NextStep useful for your research and applications, please consider starring this repository and citing: ```bibtex @misc{nextstep_1, title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale}, author={NextStep Team}, year={2025}, url={https://github.com/stepfun-ai/NextStep-1}, } ```