charantejapolavarapu's picture
Update README.md
3e5066d verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: BitDance-14B-64x
emoji: πŸš€
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Open-source autoregressive model with binary visual tokens.

πŸš€ BitDance-14B-64x

BitDance is a scalable autoregressive (AR) foundation model with 14 billion parameters. It introduces a novel approach to image generation by predicting binary visual tokens instead of standard codebook indices.

🌟 Key Features

  • Binary Visual Tokenizer: Scales token entropy to $2^{256}$ states, providing a highly expressive yet compact discrete representation.
  • Binary Diffusion Head: Replaces standard categorical classification with continuous-space diffusion for high-precision sampling in massive discrete spaces.
  • Next-Patch Diffusion: A parallel decoding paradigm that predicts up to 64 tokens per step, achieving a 30x speedup over traditional AR models for 1024x1024 resolution.
  • Multimodal Foundation: Trained on large-scale multimodal data, excelling in prompt adherence, spatial reasoning, and high-fidelity photorealistic rendering.

πŸ› οΈ Performance

Model Tokens/Step Speedup (vs. standard AR) Target Resolution
BitDance-14B-16x 16 ~8x 512px & 1024px
BitDance-14B-64x 64 ~30x 1024px

πŸš€ Quick Start (Local Setup)

If you wish to run the model locally using the diffusers library:

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "shallowdream204/BitDance-14B-64x", 
    custom_pipeline="shallowdream204/BitDance-14B-64x",
    torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A cinematic portrait of a futuristic explorer in a neon-lit cyberpunk city, ultra-detailed, 8k."
image = pipe(prompt=prompt, height=1024, width=1024).images[0]
image.save("output.png")