ByteDance-Seed
/

BAGEL-7B-MoT

@@ -1,12 +1,16 @@
 ---
-license: apache-2.0
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 pipeline_tag: any-to-any
-library_name: bagel-mot
 ---
 <p align="left">
   <img src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nuhojubrps/banner.png" alt="BAGEL" width="480"/>
 </p>
@@ -56,9 +60,102 @@ library_name: bagel-mot
 Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
-This repository hosts the model weights for **BAGEL**. For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/bytedance-seed/BAGEL).
 <p align="left"><img src="https://github.com/ByteDance-Seed/Bagel/raw/main/assets/teaser.webp" width="80%"></p>

 ---
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
+library_name: transformers
+license: apache-2.0
 pipeline_tag: any-to-any
+tags:
+- multimodal
+- image-to-text
+- text-to-image
+- visual-question-answering
 ---
 <p align="left">
   <img src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nuhojubrps/banner.png" alt="BAGEL" width="480"/>
 </p>
 Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.
+This repository hosts the model weights for **BAGEL**.
+## Usage
+You can load the model and processor using the `transformers` library and perform various multimodal tasks.
+```python
+import torch
+from transformers import AutoProcessor, AutoModelForCausalLM
+from PIL import Image # For image input
+# Load the model and processor
+model_id = "bytedance-seed/BAGEL" # This refers to the current repository's model ID
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    trust_remote_code=True, # Required for custom modeling files
+)
+processor = AutoProcessor.from_pretrained(
+    model_id,
+    trust_remote_code=True, # Required for custom processing files
+)
+# Move model to GPU if available
+if torch.cuda.is_available():
+    model = model.to("cuda")
+# Example 1: Text-only input (conversational)
+input_text = "Who is the CEO of Apple?"
+messages = [
+    {"role": "user", "content": input_text},
+]
+text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
+input_ids = processor(text=text, return_tensors='pt').input_ids.to(model.device)
+with torch.inference_mode():
+    outputs = model.generate(
+        input_ids=input_ids,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.8,
+        max_new_tokens=512,
+        eos_token_id=processor.tokenizer.eos_token_id,
+        pad_token_id=processor.tokenizer.pad_token_id,
+    )
+response_text = processor.tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
+print(f"User: {input_text}
+Assistant: {response_text}")
+# Example Output: User: Who is the CEO of Apple?
+# Assistant: Tim Cook
+# Example 2: Image-only input
+# For local testing, you might need to download an example image, e.g., from the GitHub repo.
+# For this example, let's assume 'assets/apple.png' is available (replace with actual path if running locally).
+try:
+    # This path is relative to the GitHub repo structure, adjust if running locally.
+    # For a real Hub model card, you'd suggest downloading or using a public image URL.
+    raw_image = Image.open("./assets/apple.png").convert('RGB')
+except FileNotFoundError:
+    print("
+Skipping image example: 'assets/apple.png' not found. Please download an image for testing.")
+    raw_image = None
+if raw_image:
+    messages_image = [
+        {"role": "user", "content": [raw_image, "Describe the image."]},
+    ]
+    text_image = processor.apply_chat_template(messages_image, add_generation_prompt=True, tokenize=False)
+    input_ids_image = processor(text=text_image, images=raw_image, return_tensors='pt').input_ids.to(model.device)
+    with torch.inference_mode():
+        outputs_image = model.generate(
+            input_ids=input_ids_image,
+            do_sample=True,
+            temperature=0.7,
+            top_p=0.8,
+            max_new_tokens=512,
+            eos_token_id=processor.tokenizer.eos_token_id,
+            pad_token_id=processor.tokenizer.pad_token_id,
+        )
+    response_image_text = processor.tokenizer.decode(outputs_image[0][input_ids_image.shape[-1]:], skip_special_tokens=True)
+    print(f"
+User (with image): Describe the image.
+Assistant: {response_image_text}")
+    # Example Output: User (with image): Describe the image.
+    # Assistant: The image shows a close-up of a red apple on a dark background. The apple is vibrant and appears to be ripe and fresh.
+# Example 3: Image-to-image manipulation (brief overview - see GitHub for full implementation)
+# BAGEL supports free-form image manipulation. The model can generate new images as part of its response,
+# often encoded as base64 strings within the text output. For a complete example including
+# how to parse and save these generated images, please refer to the official
+# [BAGEL GitHub repository's usage examples](https://github.com/bytedance-seed/BAGEL#quick-start).
+print("
+BAGEL also supports image-to-image manipulation. See the GitHub repository for full examples.")
+For installation, a more comprehensive usage guide, and further documentation, please visit our [GitHub repository](https://github.com/bytedance-seed/BAGEL).
 <p align="left"><img src="https://github.com/ByteDance-Seed/Bagel/raw/main/assets/teaser.webp" width="80%"></p>