nielsr HF Staff commited on
Commit
2f29fd2
·
verified ·
1 Parent(s): 6522827

Improve model card: Add pipeline tag, paper link, and sample usage

Browse files

Hi! I'm Niels from the Hugging Face community team. I've opened this PR to enhance the discoverability and usability of the BitDance model card.

Specifically, this PR:
- Adds the `pipeline_tag: unconditional-image-generation` to the YAML metadata, helping users find the model when filtering by task.
- Includes `tags: image-generation, autoregressive` for better categorization.
- Adds an explicit link to the paper on the Hugging Face Hub: [BitDance: Scaling Autoregressive Generative Models with Binary Tokens](https://huggingface.co/papers/2602.14041).
- Integrates a "Sample Usage" section with a Python code snippet directly from the official GitHub repository, making it easier for users to get started with inference.

The existing license information and links to the project page and GitHub repository have been preserved.

Please let me know if you have any questions or feedback!

Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -1,5 +1,9 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
4
 
5
  # BitDance: Scaling Autoregressive Generative Models with Binary Tokens
@@ -46,8 +50,36 @@ license: apache-2.0
46
  >
47
  > For visual generation, discrete autoregressive models often struggle with poor tokenizer reconstruction, difficulties in sampling from large vocabularies, and slow token-by-token generation speeds. We present **BitDance**, which addresses these challenges via a large-vocabulary binary tokenizer, a binary diffusion head for sampling in large discrete space, and a next-patch diffusion paradigm that enables efficient multitoken prediction. BitDance is an open-source discrete autoregressive foundation model with 14B parameters, trained on large-scale multimodal tokens. While maintaining the standard language modeling paradigm for text tokens, BitDance employs a next-patch diffusion paradigm for visual tokens to predict multiple tokens in parallel—up to 64 per step. This unified multimodal framework is simple, scalable, and capable of efficiently generating high-resolution, photorealistic images.
48
 
49
- This repository hosts the **BitDance** model weights for class-conditional image generation on ImageNet. For detailed instructions, please visit our [GitHub repository](https://github.com/shallowdream204/BitDance).
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## 🪪 License
53
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: unconditional-image-generation
4
+ tags:
5
+ - image-generation
6
+ - autoregressive
7
  ---
8
 
9
  # BitDance: Scaling Autoregressive Generative Models with Binary Tokens
 
50
  >
51
  > For visual generation, discrete autoregressive models often struggle with poor tokenizer reconstruction, difficulties in sampling from large vocabularies, and slow token-by-token generation speeds. We present **BitDance**, which addresses these challenges via a large-vocabulary binary tokenizer, a binary diffusion head for sampling in large discrete space, and a next-patch diffusion paradigm that enables efficient multitoken prediction. BitDance is an open-source discrete autoregressive foundation model with 14B parameters, trained on large-scale multimodal tokens. While maintaining the standard language modeling paradigm for text tokens, BitDance employs a next-patch diffusion paradigm for visual tokens to predict multiple tokens in parallel—up to 64 per step. This unified multimodal framework is simple, scalable, and capable of efficiently generating high-resolution, photorealistic images.
52
 
53
+ This repository hosts the **BitDance** model weights, as presented in the paper [BitDance: Scaling Autoregressive Generative Models with Binary Tokens](https://huggingface.co/papers/2602.14041). For detailed instructions and class-conditional image generation on ImageNet, please visit our [GitHub repository](https://github.com/shallowdream204/BitDance).
54
 
55
+ ## Sample Usage
56
+
57
+ For detailed instructions and environment setup, please visit the [GitHub repository](https://github.com/shallowdream204/BitDance).
58
+
59
+ ```python
60
+ # example_t2i.py
61
+ from modeling.t2i_pipeline import BitDanceT2IPipeline
62
+
63
+ model_path = 'models/BitDance-14B-64x'
64
+ # model_path = 'models/BitDance-14B-16x'
65
+ device = 'cuda'
66
+
67
+ pipe = BitDanceT2IPipeline(model_path=model_path, device=device)
68
+
69
+ prompt = "A close-up portrait in a cinematic photography style, capturing a girl-next-door look on a sunny daytime urban street. She wears a khaki sweater, with long, flowing hair gently draped over her shoulders. Her head is turned slightly, revealing soft facial features illuminated by realistic, delicate sunlight coming from the left. The sunlight subtly highlights individual strands of her hair. The image has a Canon film-like color tone, evoking a warm nostalgic atmosphere."
70
+
71
+ image = pipe.generate(
72
+ prompt=prompt,
73
+ height=1024,
74
+ width=1024,
75
+ num_sampling_steps=50, # adjust to 25 steps for faster inference, but may slightly reduce quality
76
+ guidance_scale=7.5,
77
+ num_images=1,
78
+ seed=42
79
+ )[0]
80
+
81
+ image.save("example.png")
82
+ ```
83
 
84
  ## 🪪 License
85