File size: 3,968 Bytes
c50702f
 
 
 
 
 
 
26398f5
 
 
c50702f
26398f5
 
 
 
c50702f
 
 
 
c8867b4
 
c50702f
b4359c8
 
 
 
 
 
 
 
c50702f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ba4a05
c50702f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
pipeline_tag: text-to-image
library_name: diffusers
tags:
- text-to-image
- 8B
- nf4
- 4bit
- quantized
model_size: 7B
quantized_by: Abhishek Dujari
base_model:
- baidu/ERNIE-Image
base_model_relation: quantized
---

# ERNIE-Image

Ovedrive version of mixed precision targetting 12GB VRAM (full model in memory) or less than. It is not widely tested, please do share your results and optimal steps.
Minimum VRAM required is 6GB.

thank you [Justlab.ai](https://Justlab.ai) for the GPUs

Steps 20, CFG 4 for both (random seed)
| Original ERNIE-Image | Ovedrive NF4 quantized |
|---|---|
| ![Original ERNIE-Image output](./fullmodel.webp) | ![Ovedrive ERNIE-Image NF4 output](./ovedrive.webp) |
| Full precision baseline | Mixed precision NF4 |


## Quick Start

### Recommended Parameters
- Resolution: 
    - 1024x1024
    - 848x1264
    - 1264x848
    - 768x1376
    - 896x1200
    - 1376x768
    - 1200x896
- Guidance scale: 4.0
- Inference steps: 50

### Diffusers

`pip install git+https://github.com/huggingface/diffusers`

```python
import torch
from diffusers import ErnieImagePipeline

pipe = ErnieImagePipeline.from_pretrained(
    "ovedrive/ERNIE-Image-nf4",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning.",
    height=1264,
    width=848,
    num_inference_steps=50,
    guidance_scale=4.0,
    use_pe=True # use prompt enhancer
).images[0]

image.save("output.png")
```

### SGLang

Install the latest version of sglang:
```
git clone https://github.com/sgl-project/sglang.git
```

Start the server:

```bash
sglang serve --model-path baidu/ERNIE-Image
```

Send a generation request:

```bash
curl -X POST http://localhost:30000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning.",
    "height": 1264,
    "width": 848,
    "num_inference_steps": 50,
    "guidance_scale": 4.0,
    "use_pe": true

  }' \
  --output output.png
```