--- title: BitDance-14B-64x emoji: 🚀 colorFrom: red colorTo: indigo sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false license: apache-2.0 short_description: Open-source autoregressive model with binary visual tokens. --- # 🚀 BitDance-14B-64x BitDance is a scalable autoregressive (AR) foundation model with **14 billion parameters**. It introduces a novel approach to image generation by predicting **binary visual tokens** instead of standard codebook indices. ## 🌟 Key Features - **Binary Visual Tokenizer:** Scales token entropy to $2^{256}$ states, providing a highly expressive yet compact discrete representation. - **Binary Diffusion Head:** Replaces standard categorical classification with continuous-space diffusion for high-precision sampling in massive discrete spaces. - **Next-Patch Diffusion:** A parallel decoding paradigm that predicts up to **64 tokens per step**, achieving a 30x speedup over traditional AR models for 1024x1024 resolution. - **Multimodal Foundation:** Trained on large-scale multimodal data, excelling in prompt adherence, spatial reasoning, and high-fidelity photorealistic rendering. ## 🛠️ Performance | Model | Tokens/Step | Speedup (vs. standard AR) | Target Resolution | | :--- | :--- | :--- | :--- | | BitDance-14B-16x | 16 | ~8x | 512px & 1024px | | **BitDance-14B-64x** | **64** | **~30x** | **1024px** | ## 🚀 Quick Start (Local Setup) If you wish to run the model locally using the `diffusers` library: ```python import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( "shallowdream204/BitDance-14B-64x", custom_pipeline="shallowdream204/BitDance-14B-64x", torch_dtype=torch.bfloat16 ).to("cuda") prompt = "A cinematic portrait of a futuristic explorer in a neon-lit cyberpunk city, ultra-detailed, 8k." image = pipe(prompt=prompt, height=1024, width=1024).images[0] image.save("output.png")