BiliSakura
/

BitDance-14B-16x-diffusers

BitDanceDiffusionPipeline

custom-pipeline

Model card Files Files and versions

BitDance-14B-16x-diffusers / README.md

BiliSakura's picture

Update all files for BitDance-14B-16x-diffusers

9cb89f8 verified 5 days ago

|

history blame contribute delete

3.37 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: text-to-image
	base_model: shallowdream204/BitDance-14B-16x
	language:
	- en
	tags:
	- bitdance
	- text-to-image
	- custom-pipeline
	- diffusers
	- qwen
	---

	# BitDance-14B-16x (Diffusers)

	Diffusers-converted checkpoint for BitDance-14B-16x with bundled custom pipeline code (`bitdance_diffusers`) so it can be loaded directly with `DiffusionPipeline`.

	## Quickstart (native diffusers)

	```python
	import torch
	from diffusers import DiffusionPipeline

	# Local path (recommended - no trust_remote_code needed)
	model_path = "BiliSakura/BitDance-14B-16x-diffusers"
	pipe = DiffusionPipeline.from_pretrained(
	model_path,
	custom_pipeline=model_path,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	result = pipe(
	prompt="A close-up portrait in a cinematic photography style, capturing a girl-next-door look on a sunny daytime urban street. She wears a khaki sweater, with long, flowing hair gently draped over her shoulders. Her head is turned slightly, revealing soft facial features illuminated by realistic, delicate sunlight coming from the left. The sunlight subtly highlights individual strands of her hair. The image has a Canon film-like color tone, evoking a warm nostalgic atmosphere.",
	height=1024,
	width=1024,
	num_inference_steps=50,
	guidance_scale=7.5,
	show_progress_bar=True,
	)
	result.images[0].save("bitdance_14b_16x.png")
	```

	## Test Running

	Run tests from the model directory in your active Python environment:

	```bash
	python test_bitdance.py
	```

	## VRAM Usage by Resolution

	Measured on `NVIDIA A100-SXM4-80GB` using:

	- `dtype=torch.bfloat16`
	- `num_inference_steps=30`
	- `guidance_scale=7.5`
	- prompt: `A cinematic landscape photo of snowy mountains at sunrise.`

	\| Resolution \| Peak Allocated VRAM (GiB) \| Peak Reserved VRAM (GiB) \| Time (s) \| Status \|
	\| --- \| ---: \| ---: \| ---: \| --- \|
	\| 512x512 \| 32.67 \| 33.47 \| 13.71 \| ok \|
	\| 1024x1024 \| 35.51 \| 38.76 \| 54.47 \| ok \|
	\| 1280x768 \| 35.28 \| 38.34 \| 50.97 \| ok \|
	\| 768x1280 \| 35.28 \| 38.34 \| 51.22 \| ok \|
	\| 1536x640 \| 35.28 \| 38.34 \| 51.29 \| ok \|
	\| 2048x512 \| 35.51 \| 38.76 \| 54.61 \| ok \|

	## Model Metadata

	- Pipeline class: `BitDanceDiffusionPipeline`
	- Diffusers version in config: `0.36.0`
	- Parallel prediction factor: `16`
	- Text stack: `Qwen3ForCausalLM` + `Qwen2TokenizerFast`
	- Supported resolutions include `1024x1024`, `1280x768`, `768x1280`, `2048x512`, and more (see `model_index.json`)

	## Citation

	If you use this model, please cite BitDance and Diffusers:

	```bibtex
	@article{ai2026bitdance,
	title = {BitDance: Scaling Autoregressive Generative Models with Binary Tokens},
	author = {Ai, Yuang and Han, Jiaming and Zhuang, Shaobin and Hu, Xuefeng and Yang, Ziyan and Yang, Zhenheng and Huang, Huaibo and Yue, Xiangyu and Chen, Hao},
	journal = {arXiv preprint arXiv:2602.14041},
	year = {2026}
	}

	@inproceedings{von-platen-etal-2022-diffusers,
	title = {Diffusers: State-of-the-art diffusion models},
	author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Damar Jablonski and Hernan Bischof and Thomas Wolf},
	booktitle = {GitHub repository},
	year = {2022},
	url = {https://github.com/huggingface/diffusers}
	}
	```

	## License

	This repository is distributed under the Apache-2.0 license, consistent with the upstream BitDance release.