Any-to-Any
Bagel
Safetensors
bagel

Improve model card: Update `library_name`, add `multimodal` tag, and add sample usage

#27
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +103 -6
README.md CHANGED
@@ -1,12 +1,16 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen2.5-7B-Instruct
 
 
5
  pipeline_tag: any-to-any
6
- library_name: bagel-mot
 
 
 
 
7
  ---
8
 
9
-
10
  <p align="left">
11
  <img src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nuhojubrps/banner.png" alt="BAGEL" width="480"/>
12
  </p>
@@ -56,9 +60,102 @@ library_name: bagel-mot
56
  Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
57
 
58
 
59
- This repository hosts the model weights for **BAGEL**. For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/bytedance-seed/BAGEL).
60
-
61
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  <p align="left"><img src="https://github.com/ByteDance-Seed/Bagel/raw/main/assets/teaser.webp" width="80%"></p>
64
 
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen2.5-7B-Instruct
4
+ library_name: transformers
5
+ license: apache-2.0
6
  pipeline_tag: any-to-any
7
+ tags:
8
+ - multimodal
9
+ - image-to-text
10
+ - text-to-image
11
+ - visual-question-answering
12
  ---
13
 
 
14
  <p align="left">
15
  <img src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nuhojubrps/banner.png" alt="BAGEL" width="480"/>
16
  </p>
 
60
  Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
61
 
62
 
63
+ This repository hosts the model weights for **BAGEL**.
64
+
65
+ ## Usage
66
+
67
+ You can load the model and processor using the `transformers` library and perform various multimodal tasks.
68
+
69
+ ```python
70
+ import torch
71
+ from transformers import AutoProcessor, AutoModelForCausalLM
72
+ from PIL import Image # For image input
73
+
74
+ # Load the model and processor
75
+ model_id = "bytedance-seed/BAGEL" # This refers to the current repository's model ID
76
+ model = AutoModelForCausalLM.from_pretrained(
77
+ model_id,
78
+ torch_dtype=torch.bfloat16,
79
+ trust_remote_code=True, # Required for custom modeling files
80
+ )
81
+ processor = AutoProcessor.from_pretrained(
82
+ model_id,
83
+ trust_remote_code=True, # Required for custom processing files
84
+ )
85
+
86
+ # Move model to GPU if available
87
+ if torch.cuda.is_available():
88
+ model = model.to("cuda")
89
+
90
+ # Example 1: Text-only input (conversational)
91
+ input_text = "Who is the CEO of Apple?"
92
+ messages = [
93
+ {"role": "user", "content": input_text},
94
+ ]
95
+ text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
96
+ input_ids = processor(text=text, return_tensors='pt').input_ids.to(model.device)
97
+
98
+ with torch.inference_mode():
99
+ outputs = model.generate(
100
+ input_ids=input_ids,
101
+ do_sample=True,
102
+ temperature=0.7,
103
+ top_p=0.8,
104
+ max_new_tokens=512,
105
+ eos_token_id=processor.tokenizer.eos_token_id,
106
+ pad_token_id=processor.tokenizer.pad_token_id,
107
+ )
108
+ response_text = processor.tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
109
+ print(f"User: {input_text}
110
+ Assistant: {response_text}")
111
+ # Example Output: User: Who is the CEO of Apple?
112
+ # Assistant: Tim Cook
113
+
114
+ # Example 2: Image-only input
115
+ # For local testing, you might need to download an example image, e.g., from the GitHub repo.
116
+ # For this example, let's assume 'assets/apple.png' is available (replace with actual path if running locally).
117
+ try:
118
+ # This path is relative to the GitHub repo structure, adjust if running locally.
119
+ # For a real Hub model card, you'd suggest downloading or using a public image URL.
120
+ raw_image = Image.open("./assets/apple.png").convert('RGB')
121
+ except FileNotFoundError:
122
+ print("
123
+ Skipping image example: 'assets/apple.png' not found. Please download an image for testing.")
124
+ raw_image = None
125
+
126
+ if raw_image:
127
+ messages_image = [
128
+ {"role": "user", "content": [raw_image, "Describe the image."]},
129
+ ]
130
+ text_image = processor.apply_chat_template(messages_image, add_generation_prompt=True, tokenize=False)
131
+ input_ids_image = processor(text=text_image, images=raw_image, return_tensors='pt').input_ids.to(model.device)
132
+
133
+ with torch.inference_mode():
134
+ outputs_image = model.generate(
135
+ input_ids=input_ids_image,
136
+ do_sample=True,
137
+ temperature=0.7,
138
+ top_p=0.8,
139
+ max_new_tokens=512,
140
+ eos_token_id=processor.tokenizer.eos_token_id,
141
+ pad_token_id=processor.tokenizer.pad_token_id,
142
+ )
143
+ response_image_text = processor.tokenizer.decode(outputs_image[0][input_ids_image.shape[-1]:], skip_special_tokens=True)
144
+ print(f"
145
+ User (with image): Describe the image.
146
+ Assistant: {response_image_text}")
147
+ # Example Output: User (with image): Describe the image.
148
+ # Assistant: The image shows a close-up of a red apple on a dark background. The apple is vibrant and appears to be ripe and fresh.
149
+
150
+ # Example 3: Image-to-image manipulation (brief overview - see GitHub for full implementation)
151
+ # BAGEL supports free-form image manipulation. The model can generate new images as part of its response,
152
+ # often encoded as base64 strings within the text output. For a complete example including
153
+ # how to parse and save these generated images, please refer to the official
154
+ # [BAGEL GitHub repository's usage examples](https://github.com/bytedance-seed/BAGEL#quick-start).
155
+ print("
156
+ BAGEL also supports image-to-image manipulation. See the GitHub repository for full examples.")
157
+
158
+ For installation, a more comprehensive usage guide, and further documentation, please visit our [GitHub repository](https://github.com/bytedance-seed/BAGEL).
159
 
160
  <p align="left"><img src="https://github.com/ByteDance-Seed/Bagel/raw/main/assets/teaser.webp" width="80%"></p>
161