Gen-Verse
/

MMaDA-8B-MixCoT

feature-extraction

Model card Files Files and versions

Add sample usage to model card

#1

by nielsr HF Staff - opened Sep 26, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -1,8 +1,9 @@
 ---
-license: mit
 library_name: transformers
 pipeline_tag: any-to-any
 ---
 # MMaDA-8B-MixCoT
 We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
@@ -15,6 +16,40 @@ Compared to [MMaDA-8B-Base](https://huggingface.co/Gen-Verse/MMaDA-8B-Base), MMa
 [Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
 # Citation
 ```

 ---
 library_name: transformers
+license: mit
 pipeline_tag: any-to-any
 ---
 # MMaDA-8B-MixCoT
 We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
 [Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
+## Sample Usage
+You can use the provided `FlexARInferenceSolver` from the [GitHub repository](https://github.com/Gen-Verse/MMaDA) to easily perform various tasks, such as image generation.
+First, ensure you have cloned the repository and installed the necessary dependencies as per the GitHub repository's instructions (`pip install -r requirements.txt`).
+```python
+from MMaDA.inference_solver import FlexARInferenceSolver
+from PIL import Image
+# ******************** Image Generation ********************
+inference_solver = FlexARInferenceSolver(
+    model_path="Gen-Verse/MMaDA-8B-MixCoT",
+    precision="bf16",
+    target_size=768,
+)
+q1 = f"Generate an image of 768x768 according to the following prompt:\
+" \
+     f"Image of a dog playing water, and a waterfall is in the background."
+# generated: tuple of (generated response, list of generated images)
+generated = inference_solver.generate(
+    images=[],
+    qas=[[q1, None]],
+    max_gen_len=8192,
+    temperature=1.0,
+    logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
+)
+a1, new_image = generated[0], generated[1][0]
+new_image.show() # Display the generated image
+```
 # Citation
 ```