BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Project Page BitDance Paper on arXiv BitDance GitHub BitDance Model BitDance Demo

Yuang Ai*, Jiaming Han*, Shaobin Zhuang*, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Huaibo Huang†, Xiangyu Yue†, Hao Chen*†‑

* Equal Contribution  β€  Corresponding Author  β€‘ Project Lead

For visual generation, discrete autoregressive models often struggle with poor tokenizer reconstruction, difficulties in sampling from large vocabularies, and slow token-by-token generation speeds. We present BitDance, which addresses these challenges via a large-vocabulary binary tokenizer, a binary diffusion head for sampling in large discrete space, and a next-patch diffusion paradigm that enables efficient multitoken prediction. BitDance is an open-source discrete autoregressive foundation model with 14B parameters, trained on large-scale multimodal tokens. While maintaining the standard language modeling paradigm for text tokens, BitDance employs a next-patch diffusion paradigm for visual tokens to predict multiple tokens in parallelβ€”up to 64 per step. This unified multimodal framework is simple, scalable, and capable of efficiently generating high-resolution, photorealistic images.

⚑ Quick Start

1️⃣ Create Conda Environment and Install Package

git clone https://github.com/shallowdream204/BitDance.git
cd BitDance
conda create -n bitdance python=3.11 -y
conda activate bitdance
pip install -r requirements.txt
pip install flash_attn==2.8.2 --no-build-isolation

2️⃣ Download Model Weights

We offer two models, BitDance-14B-64x and BitDance-14B-16x, which can predict 64 and 16 tokens in parallel at each step, respectively.

Model #Token per Step Step (1024px) Supported Size Huggingface
BitDance-14B-64x 64 64 1024px BitDance-14B-64x
BitDance-14B-16x 16 256 512&1024px BitDance-14B-16x
from huggingface_hub import snapshot_download

save_dir = "models/BitDance-14B-64x"
repo_id = "shallowdream204/BitDance-14B-64x"
cache_dir = save_dir + "/cache"

snapshot_download(cache_dir=cache_dir,
  local_dir=save_dir,
  repo_id=repo_id,
  local_dir_use_symlinks=False,
  resume_download=True,
  allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)

save_dir = "models/BitDance-14B-16x"
repo_id = "shallowdream204/BitDance-14B-16x"
cache_dir = save_dir + "/cache"

snapshot_download(cache_dir=cache_dir,
  local_dir=save_dir,
  repo_id=repo_id,
  local_dir_use_symlinks=False,
  resume_download=True,
  allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)

3️⃣ T2I Inference (check here for the supported image resolution)

# example_t2i.py
from modeling.t2i_pipeline import BitDanceT2IPipeline

model_path = 'models/BitDance-14B-64x'
# model_path = 'models/BitDance-14B-16x'
device = 'cuda'

pipe = BitDanceT2IPipeline(model_path=model_path, device=device)

prompt = "A close-up portrait in a cinematic photography style, capturing a girl-next-door look on a sunny daytime urban street. She wears a khaki sweater, with long, flowing hair gently draped over her shoulders. Her head is turned slightly, revealing soft facial features illuminated by realistic, delicate sunlight coming from the left. The sunlight subtly highlights individual strands of her hair. The image has a Canon film-like color tone, evoking a warm nostalgic atmosphere."

image = pipe.generate(
    prompt=prompt,
    height=1024,
    width=1024,
    num_sampling_steps=50, # may adjust to 25 steps for faster inference, but may slightly reduce quality
    guidance_scale=7.5,
    num_images=1,
    seed=42
)[0]

image.save("example.png")

πŸ€— Demo

πŸ”₯ Try the Huggingface Space demo to start playing with BitDance: BitDance-Demo

You can also run the demo locally:

python app.py

πŸ“Š Model Performance

Model Open Source DPG-Bench GenEval OneIG-Bench TIIF-Bench
EN ZH short long
GPT Image 1 βœ— 85.15 0.84 0.533 0.474 89.15 88.29
Seedream 3.0 βœ— 88.27 0.84 0.530 0.528 86.02 84.31
Qwen-Image βœ“ 88.32 0.87 0.539 0.548 86.14 86.83
Z-Image βœ“ 88.14 0.84 0.546 0.535 80.20 83.01
Z-Image-Turbo βœ“ 84.86 0.82 0.528 0.507 77.73 80.05
FLUX.1 [Dev] βœ“ 83.84 0.66 0.434 - 71.09 71.78
BAGEL βœ“ 85.07 0.88 0.361 0.370 71.50 71.70
Infinity βœ“ 83.46 0.73 - - 62.07 62.32
Janus-Pro βœ“ 84.19 0.80 0.267 0.240 66.50 65.01
Show-o2 βœ“ 86.14 0.76 0.308 - 59.72 58.86
NextStep-1 βœ“ 85.28 0.73 0.418 - - -
GLM-Image βœ“ 84.78 - 0.528 0.511 81.01 81.02
BitDance βœ“ 88.28 0.86 0.532 0.512 79.64 78.12

πŸͺͺ License

BitDance is licensed under the Apache 2.0 license.

πŸ“– Citation

If you find our work useful for your research, please consider citing our paper:

@article{ai2026bitdance,
  title   = {BitDance: Scaling Autoregressive Generative Models with Binary Tokens},
  author  = {Ai, Yuang and Han, Jiaming and Zhuang, Shaobin and Hu, Xuefeng and Yang, Ziyan and Yang, Zhenheng and Huang, Huaibo and Yue, Xiangyu and Chen, Hao},
  journal = {arXiv preprint arXiv:2602.14041},
  year    = {2026}
}
Downloads last month
55
Safetensors
Model size
15B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shallowdream204/BitDance-14B-64x

Finetuned
Qwen/Qwen3-14B
Finetuned
(1)
this model

Space using shallowdream204/BitDance-14B-64x 1

Collection including shallowdream204/BitDance-14B-64x

Paper for shallowdream204/BitDance-14B-64x