Add sample usage to model card
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
library_name: transformers
|
|
|
|
| 4 |
pipeline_tag: any-to-any
|
| 5 |
---
|
|
|
|
| 6 |
# MMaDA-8B-MixCoT
|
| 7 |
|
| 8 |
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
|
|
@@ -15,6 +16,40 @@ Compared to [MMaDA-8B-Base](https://huggingface.co/Gen-Verse/MMaDA-8B-Base), MMa
|
|
| 15 |
|
| 16 |
[Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
# Citation
|
| 19 |
|
| 20 |
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
+
license: mit
|
| 4 |
pipeline_tag: any-to-any
|
| 5 |
---
|
| 6 |
+
|
| 7 |
# MMaDA-8B-MixCoT
|
| 8 |
|
| 9 |
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
|
|
|
|
| 16 |
|
| 17 |
[Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
|
| 18 |
|
| 19 |
+
## Sample Usage
|
| 20 |
+
|
| 21 |
+
You can use the provided `FlexARInferenceSolver` from the [GitHub repository](https://github.com/Gen-Verse/MMaDA) to easily perform various tasks, such as image generation.
|
| 22 |
+
|
| 23 |
+
First, ensure you have cloned the repository and installed the necessary dependencies as per the GitHub repository's instructions (`pip install -r requirements.txt`).
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
from MMaDA.inference_solver import FlexARInferenceSolver
|
| 27 |
+
from PIL import Image
|
| 28 |
+
|
| 29 |
+
# ******************** Image Generation ********************
|
| 30 |
+
inference_solver = FlexARInferenceSolver(
|
| 31 |
+
model_path="Gen-Verse/MMaDA-8B-MixCoT",
|
| 32 |
+
precision="bf16",
|
| 33 |
+
target_size=768,
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
q1 = f"Generate an image of 768x768 according to the following prompt:\
|
| 37 |
+
" \
|
| 38 |
+
f"Image of a dog playing water, and a waterfall is in the background."
|
| 39 |
+
|
| 40 |
+
# generated: tuple of (generated response, list of generated images)
|
| 41 |
+
generated = inference_solver.generate(
|
| 42 |
+
images=[],
|
| 43 |
+
qas=[[q1, None]],
|
| 44 |
+
max_gen_len=8192,
|
| 45 |
+
temperature=1.0,
|
| 46 |
+
logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
a1, new_image = generated[0], generated[1][0]
|
| 50 |
+
new_image.show() # Display the generated image
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
# Citation
|
| 54 |
|
| 55 |
```
|