Add sample usage to model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -1,8 +1,9 @@
1
  ---
2
- license: mit
3
  library_name: transformers
 
4
  pipeline_tag: any-to-any
5
  ---
 
6
  # MMaDA-8B-MixCoT
7
 
8
  We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
@@ -15,6 +16,40 @@ Compared to [MMaDA-8B-Base](https://huggingface.co/Gen-Verse/MMaDA-8B-Base), MMa
15
 
16
  [Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  # Citation
19
 
20
  ```
 
1
  ---
 
2
  library_name: transformers
3
+ license: mit
4
  pipeline_tag: any-to-any
5
  ---
6
+
7
  # MMaDA-8B-MixCoT
8
 
9
  We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
 
16
 
17
  [Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
18
 
19
+ ## Sample Usage
20
+
21
+ You can use the provided `FlexARInferenceSolver` from the [GitHub repository](https://github.com/Gen-Verse/MMaDA) to easily perform various tasks, such as image generation.
22
+
23
+ First, ensure you have cloned the repository and installed the necessary dependencies as per the GitHub repository's instructions (`pip install -r requirements.txt`).
24
+
25
+ ```python
26
+ from MMaDA.inference_solver import FlexARInferenceSolver
27
+ from PIL import Image
28
+
29
+ # ******************** Image Generation ********************
30
+ inference_solver = FlexARInferenceSolver(
31
+ model_path="Gen-Verse/MMaDA-8B-MixCoT",
32
+ precision="bf16",
33
+ target_size=768,
34
+ )
35
+
36
+ q1 = f"Generate an image of 768x768 according to the following prompt:\
37
+ " \
38
+ f"Image of a dog playing water, and a waterfall is in the background."
39
+
40
+ # generated: tuple of (generated response, list of generated images)
41
+ generated = inference_solver.generate(
42
+ images=[],
43
+ qas=[[q1, None]],
44
+ max_gen_len=8192,
45
+ temperature=1.0,
46
+ logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
47
+ )
48
+
49
+ a1, new_image = generated[0], generated[1][0]
50
+ new_image.show() # Display the generated image
51
+ ```
52
+
53
  # Citation
54
 
55
  ```