File size: 1,633 Bytes
3f1ad8f
 
 
 
06fb064
 
 
 
 
 
 
 
4dba445
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
06fb064
 
 
4dba445
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
library_name: diffusers
---

# Mann-E FLUX[Dev] Edition

<p align="center">
  <img src="demo.png" width=720 height=1280 />
</p>

## How to use the model

### Install needed libraries

```
pip install git+https://github.com/huggingface/diffusers.git transformers==4.42.4 accelerate xformers peft sentencepiece protobuf -q
```

### Execution code

```python
import numpy as np
import random
import torch
from diffusers import  DiffusionPipeline, FlowMatchEulerDiscreteScheduler, AutoencoderTiny, AutoencoderKL
from transformers import CLIPTextModel, CLIPTokenizer,T5EncoderModel, T5TokenizerFast

dtype = torch.bfloat16
device = "cuda" if torch.cuda.is_available() else "cpu"

taef1 = AutoencoderTiny.from_pretrained("madebyollin/taef1", torch_dtype=dtype).to(device)
pipe = DiffusionPipeline.from_pretrained("mann-e/mann-e_flux", torch_dtype=dtype, vae=taef1).to(device)
torch.cuda.empty_cache()

MAX_SEED = np.iinfo(np.int32).max
MAX_IMAGE_SIZE = 2048

seed = random.randint(0, MAX_SEED)
generator = torch.Generator().manual_seed(seed)

prompt = "an astronaut riding a horse"

pipe(
            prompt=f"{prompt}",
            guidance_scale=3.5,
            num_inference_steps=10,
            width=720,
            height=1280,
            generator=generator,
            output_type="pil"
        ).images[0].save("output.png")
```

## Tips and Tricks

1. Adding `mj-v6.1-style` to the prompts specially the cinematic and photo realistic prompts can make the result quality high as hell! Give it a try.
2. The best `guidance_scale` is somewhere between 3.5 and 5.0
3. Inference steps between 8 and 16 are working very well.