---
library_name: diffusers
pipeline_tag: image-text-to-text
---

This repository contains the model described in the paper [Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation](https://arxiv.org/abs/2502.05415).

To get started with the 512-resolution model:
```bash
sh inference_t2i_512.sh
```
To get started with the 256-resolution model:
```bash
sh inference_t2i_256.sh
```