RecA
Collection
Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! β’ 8 items β’ Updated β’ 14
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("sanaka87/Show-o-RecA", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
This repository hosts the model weights for Show-o-RecA. For installation, usage instructions, and further documentation, please visit Show-o's original GitHub repository.
| Model | GenEval β | DPGBench β | WISE β |
|---|---|---|---|
| Show-o | 0.57 | 70.65 | 0.33 |
| Show-o-RecA | 0.62 | 75.70 | 0.34 |
Show-o-RecA is licensed under the Apache 2.0 license.
If you find our work inspiring or use our codebase in your research, please consider giving a star β and a citation~
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
Base model
showlab/show-o-w-clip-vit