baidu
/

ERNIE-Image

@@ -6,12 +6,8 @@ tags:
   - text-to-image
 ---
 # ERNIE-Image
-<p align="center">
-  <img src="mosaic.jpg" alt="ERNIE-Image Mosaic" width="60%">
-</p>
 <p align="center">
   <a href="https://huggingface.co/Baidu/ERNIE-Image">🤗 ERNIE-Image</a> &nbsp;|&nbsp;
@@ -21,8 +17,13 @@ tags:
   <a href="TODO">🖼️ Gallery</a>
 </p>
 ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is built on a single-stream Diffusion Transformer (DiT) and paired with a lightweight Prompt Enhancer that expands brief user inputs into richer structured descriptions. With only 8B DiT parameters, it reaches state-of-the-art performance among open-weight text-to-image models. The model is designed not only for strong visual quality, but also for controllability in practical generation scenarios where accurate content realization matters as much as aesthetics. In particular, ERNIE-Image performs strongly on complex instruction following, text rendering, and structured image generation, making it well suited for commercial posters, comics, multi-panel layouts, and other content creation tasks that require both visual quality and precise control. It also supports a broad range of visual styles, including realistic photography, design-oriented imagery, and more stylized aesthetic outputs.
 **Highlights:**
 - **Compact but strong**: Despite its compact 8B scale, ERNIE-Image remains highly competitive with substantially larger open-weight models across a range of benchmarks.
 - **Text rendering**: ERNIE-Image performs particularly well on dense, long-form, and layout-sensitive text, making it a strong choice for posters, infographics, UI-like images, and other text-heavy visual content.

   - text-to-image
 ---
 # ERNIE-Image
 <p align="center">
   <a href="https://huggingface.co/Baidu/ERNIE-Image">🤗 ERNIE-Image</a> &nbsp;|&nbsp;
   <a href="TODO">🖼️ Gallery</a>
 </p>
 ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is built on a single-stream Diffusion Transformer (DiT) and paired with a lightweight Prompt Enhancer that expands brief user inputs into richer structured descriptions. With only 8B DiT parameters, it reaches state-of-the-art performance among open-weight text-to-image models. The model is designed not only for strong visual quality, but also for controllability in practical generation scenarios where accurate content realization matters as much as aesthetics. In particular, ERNIE-Image performs strongly on complex instruction following, text rendering, and structured image generation, making it well suited for commercial posters, comics, multi-panel layouts, and other content creation tasks that require both visual quality and precise control. It also supports a broad range of visual styles, including realistic photography, design-oriented imagery, and more stylized aesthetic outputs.
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/zDC-EOfPO6RAFIE6xD1SW.jpeg" alt="ERNIE-Image Mosaic" width="60%">
+</p>
 **Highlights:**
 - **Compact but strong**: Despite its compact 8B scale, ERNIE-Image remains highly competitive with substantially larger open-weight models across a range of benchmarks.
 - **Text rendering**: ERNIE-Image performs particularly well on dense, long-form, and layout-sensitive text, making it a strong choice for posters, infographics, UI-like images, and other text-heavy visual content.