Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,7 @@ license: apache-2.0
|
|
| 40 |
>
|
| 41 |
> For visual generation, discrete autoregressive models often struggle with poor tokenizer reconstruction, difficulties in sampling from large vocabularies, and slow token-by-token generation speeds. We present **BitDance**, which addresses these challenges via a large-vocabulary binary tokenizer, a binary diffusion head for sampling in large discrete space, and a next-patch diffusion paradigm that enables efficient multitoken prediction. BitDance is an open-source discrete autoregressive foundation model with 14B parameters, trained on large-scale multimodal tokens. While maintaining the standard language modeling paradigm for text tokens, BitDance employs a next-patch diffusion paradigm for visual tokens to predict multiple tokens in parallel—up to 64 per step. This unified multimodal framework is simple, scalable, and capable of efficiently generating high-resolution, photorealistic images.
|
| 42 |
|
| 43 |
-
This repository hosts the BitDance model weights for ImageNet Generation. For detailed instructions, please visit our [GitHub Repository](https://github.com/shallowdream204/BitDance).
|
| 44 |
|
| 45 |
|
| 46 |
## 🪪 License
|
|
|
|
| 40 |
>
|
| 41 |
> For visual generation, discrete autoregressive models often struggle with poor tokenizer reconstruction, difficulties in sampling from large vocabularies, and slow token-by-token generation speeds. We present **BitDance**, which addresses these challenges via a large-vocabulary binary tokenizer, a binary diffusion head for sampling in large discrete space, and a next-patch diffusion paradigm that enables efficient multitoken prediction. BitDance is an open-source discrete autoregressive foundation model with 14B parameters, trained on large-scale multimodal tokens. While maintaining the standard language modeling paradigm for text tokens, BitDance employs a next-patch diffusion paradigm for visual tokens to predict multiple tokens in parallel—up to 64 per step. This unified multimodal framework is simple, scalable, and capable of efficiently generating high-resolution, photorealistic images.
|
| 42 |
|
| 43 |
+
This repository hosts the **BitDance** model weights for ImageNet Generation. For detailed instructions, please visit our [GitHub Repository](https://github.com/shallowdream204/BitDance).
|
| 44 |
|
| 45 |
|
| 46 |
## 🪪 License
|