Update README.md

Files changed (5) hide show

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/model.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

+# TBAC-UniImage-3B
+## Overview
+This repository contains the official model checkpoints of **TBAC-UniImage-3B**, an unified understanding and generation model developed by Basic Algorithm Center, Platform and Content Group, Tencent.
+Our model is composed of two components: the [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) serves as the understanding module, while the [SANA-1600M](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) acts as the generation module. The conditions for generation are originate from representations of different Qwen2.5-VL-3B-Instruct layers.
+![Model](./assets/model.png)
+## Performance
+| Method | Base (M)LLM | GenEval | DPG-Bench |
+| :--- | :--- | :--- | :--- |
+| MetaQuery | Qwen2.5-VL-3B-Instruct | 0.78 | 81.10 |
+| | Qwen2.5-VL-7B-Instruct | 0.80 | 82.05 |
+| BILP-3o | Qwen2.5-VL-3B-Instruct | 0.81 | 79.36 |
+| | Qwen2.5-VL-7B-Instruct | 0.84 | 81.60 |
+| BAGEL | MoT-7B | 0.82 | - |
+| Show-o2 | Qwen2.5-1.5B-Instruct | 0.73 | 85.02 |
+| | Qwen2.5-7B-Instruct | 0.76 | 86.14 |
+| **Ours** | **Qwen2.5-VL-3B-Instruct** | **0.87** | 81.00 |
+## Acknowledgements
+The training and inference codes are modified from [MetaQuery](https://github.com/facebookresearch/metaquery). We thank them for their contribution!
+## About
+Created by the Tencent PCG Basic Algorithm Center. All rights reserved.

assets/model.png ADDED Viewed

config.json ADDED Viewed

+{
+  "_gradient_checkpointing": true,
+  "architectures": [
+    "TBACUniImage"
+  ],
+  "attn_implementation": null,
+  "diffusion_model_id": "Efficient-Large-Model/Sana_1600M_512px_diffusers",
+  "in_channels": 32,
+  "input_size": 16,
+  "loss_type": "flow",
+  "max_input_text_tokens": 256,
+  "mllm_id": "Qwen/Qwen2.5-VL-3B-Instruct",
+  "model_type": "metaquery",
+  "modules_to_freeze": [
+    "vae",
+    "model.mllm_backbone"
+  ],
+  "modules_to_unfreeze": [
+    "model.mllm_backbone.model.embed_tokens"
+  ],
+  "noise_scheduler_id": "Efficient-Large-Model/Sana_1600M_512px_diffusers",
+  "num_metaqueries": 64,
+  "scheduler_id": "Efficient-Large-Model/Sana_1600M_512px_diffusers",
+  "system_prompt": "You will be given an image or its caption. Please describe the content of the image in detail in your own words.",
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.49.0",
+  "vae_downsample_f": 32,
+  "vae_id": "Efficient-Large-Model/Sana_1600M_512px_diffusers"
+}

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff