BitDance-ImageNet / README.md

nielsr HF Staff

Improve model card: Add pipeline tag, paper link, and sample usage

2f29fd2 verified 1 day ago

preview code

raw

history blame

4.93 kB

metadata

license: apache-2.0
pipeline_tag: unconditional-image-generation
tags:
  - image-generation
  - autoregressive

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Yuang Ai*, Jiaming Han*, Shaobin Zhuang*, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Huaibo Huang†, Xiangyu Yue†, Hao Chen*†‡

^* Equal Contribution ^† Corresponding Author ^‡ Project Lead

For visual generation, discrete autoregressive models often struggle with poor tokenizer reconstruction, difficulties in sampling from large vocabularies, and slow token-by-token generation speeds. We present BitDance, which addresses these challenges via a large-vocabulary binary tokenizer, a binary diffusion head for sampling in large discrete space, and a next-patch diffusion paradigm that enables efficient multitoken prediction. BitDance is an open-source discrete autoregressive foundation model with 14B parameters, trained on large-scale multimodal tokens. While maintaining the standard language modeling paradigm for text tokens, BitDance employs a next-patch diffusion paradigm for visual tokens to predict multiple tokens in parallel—up to 64 per step. This unified multimodal framework is simple, scalable, and capable of efficiently generating high-resolution, photorealistic images.

This repository hosts the BitDance model weights, as presented in the paper BitDance: Scaling Autoregressive Generative Models with Binary Tokens. For detailed instructions and class-conditional image generation on ImageNet, please visit our GitHub repository.

Sample Usage

For detailed instructions and environment setup, please visit the GitHub repository.

# example_t2i.py
from modeling.t2i_pipeline import BitDanceT2IPipeline

model_path = 'models/BitDance-14B-64x'
# model_path = 'models/BitDance-14B-16x'
device = 'cuda'

pipe = BitDanceT2IPipeline(model_path=model_path, device=device)

prompt = "A close-up portrait in a cinematic photography style, capturing a girl-next-door look on a sunny daytime urban street. She wears a khaki sweater, with long, flowing hair gently draped over her shoulders. Her head is turned slightly, revealing soft facial features illuminated by realistic, delicate sunlight coming from the left. The sunlight subtly highlights individual strands of her hair. The image has a Canon film-like color tone, evoking a warm nostalgic atmosphere."

image = pipe.generate(
    prompt=prompt,
    height=1024,
    width=1024,
    num_sampling_steps=50, # adjust to 25 steps for faster inference, but may slightly reduce quality
    guidance_scale=7.5,
    num_images=1,
    seed=42
)[0]

image.save("example.png")

🪪 License

BitDance is licensed under the Apache 2.0 license.

📖 Citation

If you find our work useful for your research, please consider citing our paper:

@article{ai2026bitdance,
  title   = {BitDance: Scaling Autoregressive Generative Models with Binary Tokens},
  author  = {Ai, Yuang and Han, Jiaming and Zhuang, Shaobin and Hu, Xuefeng and Yang, Ziyan and Yang, Zhenheng and Huang, Huaibo and Yue, Xiangyu and Chen, Hao},
  journal = {arXiv preprint arXiv:2602.14041},
  year    = {2026}
}