olenet commited on
Commit
863236d
·
verified ·
1 Parent(s): 959e6f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -8,7 +8,6 @@ tags:
8
 
9
  # ERNIE-Image
10
 
11
-
12
  <p align="center">
13
  <a href="https://huggingface.co/Baidu/ERNIE-Image">🤗 ERNIE-Image</a> &nbsp;|&nbsp;
14
  <a href="https://huggingface.co/Baidu/ERNIE-Image-Turbo">🤗 ERNIE-Image-Turbo</a> &nbsp;|&nbsp;
@@ -17,11 +16,10 @@ tags:
17
  <a href="TODO">🖼️ Gallery</a>
18
  </p>
19
 
20
-
21
  ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is built on a single-stream Diffusion Transformer (DiT) and paired with a lightweight Prompt Enhancer that expands brief user inputs into richer structured descriptions. With only 8B DiT parameters, it reaches state-of-the-art performance among open-weight text-to-image models. The model is designed not only for strong visual quality, but also for controllability in practical generation scenarios where accurate content realization matters as much as aesthetics. In particular, ERNIE-Image performs strongly on complex instruction following, text rendering, and structured image generation, making it well suited for commercial posters, comics, multi-panel layouts, and other content creation tasks that require both visual quality and precise control. It also supports a broad range of visual styles, including realistic photography, design-oriented imagery, and more stylized aesthetic outputs.
22
 
23
  <p align="center">
24
- <img src="https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/zDC-EOfPO6RAFIE6xD1SW.jpeg" alt="ERNIE-Image Mosaic" width="100%">
25
  </p>
26
 
27
  **Highlights:**
@@ -40,7 +38,7 @@ ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Ima
40
 
41
  ## Benchmark
42
 
43
- ### GenEval
44
 
45
  | Model | Single Object | Two Object | Counting | Colors | Position | Attribute Binding | Overall |
46
  |---|---:|---:|---:|---:|---:|---:|---:|
@@ -135,9 +133,9 @@ pipe = ErnieImagePipeline.from_pretrained(
135
  ).to("cuda")
136
 
137
  image = pipe(
138
- prompt="A cinematic movie poster of a futuristic city at night with clear neon signage.",
139
- height=1024,
140
- width=1024,
141
  num_inference_steps=50,
142
  guidance_scale=4.0,
143
  use_pe=True # use prompt enhancer
@@ -165,9 +163,9 @@ Send a generation request:
165
  curl -X POST http://localhost:30000/generate \
166
  -H "Content-Type: application/json" \
167
  -d '{
168
- "prompt": "一只黑白相间的中华田园犬",
169
- "height": 1024,
170
- "width": 1024,
171
  "num_inference_steps": 50,
172
  "guidance_scale": 4.0,
173
  "use_pe": true
 
8
 
9
  # ERNIE-Image
10
 
 
11
  <p align="center">
12
  <a href="https://huggingface.co/Baidu/ERNIE-Image">🤗 ERNIE-Image</a> &nbsp;|&nbsp;
13
  <a href="https://huggingface.co/Baidu/ERNIE-Image-Turbo">🤗 ERNIE-Image-Turbo</a> &nbsp;|&nbsp;
 
16
  <a href="TODO">🖼️ Gallery</a>
17
  </p>
18
 
 
19
  ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is built on a single-stream Diffusion Transformer (DiT) and paired with a lightweight Prompt Enhancer that expands brief user inputs into richer structured descriptions. With only 8B DiT parameters, it reaches state-of-the-art performance among open-weight text-to-image models. The model is designed not only for strong visual quality, but also for controllability in practical generation scenarios where accurate content realization matters as much as aesthetics. In particular, ERNIE-Image performs strongly on complex instruction following, text rendering, and structured image generation, making it well suited for commercial posters, comics, multi-panel layouts, and other content creation tasks that require both visual quality and precise control. It also supports a broad range of visual styles, including realistic photography, design-oriented imagery, and more stylized aesthetic outputs.
20
 
21
  <p align="center">
22
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/QRt1mPSU9SCkcxxFWQje2.jpeg" alt="ERNIE-Image Mosaic" width="100%">
23
  </p>
24
 
25
  **Highlights:**
 
38
 
39
  ## Benchmark
40
 
41
+ ### GENEval
42
 
43
  | Model | Single Object | Two Object | Counting | Colors | Position | Attribute Binding | Overall |
44
  |---|---:|---:|---:|---:|---:|---:|---:|
 
133
  ).to("cuda")
134
 
135
  image = pipe(
136
+ prompt="This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning.",
137
+ height=1264,
138
+ width=848,
139
  num_inference_steps=50,
140
  guidance_scale=4.0,
141
  use_pe=True # use prompt enhancer
 
163
  curl -X POST http://localhost:30000/generate \
164
  -H "Content-Type: application/json" \
165
  -d '{
166
+ "prompt": "This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning.",
167
+ "height": 1264,
168
+ "width": 848,
169
  "num_inference_steps": 50,
170
  "guidance_scale": 4.0,
171
  "use_pe": true