File size: 3,199 Bytes
4472c03
 
 
 
4f7cb81
 
 
 
 
 
 
 
 
4472c03
4f7cb81
 
4472c03
4f7cb81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
- art
license: mit
pipeline_tag: unconditional-image-generation
metrics:
- name: FID
  type: image
  value: 80.4755
  dataset: https://www.kaggle.com/datasets/ayhantasyurt/pixel-art-2dgame-charecter-sprites-idle
  split: test
---
# Sprite-flow
Flow-based generative model for unguided generation of 128x128 RGBA pixel art characters. 

## Model Details
### Model Description
- **Developed by:** [Mihailo Radović](https://www.linkedin.com/in/mihailo-radović-484070278/)
- **Model type:** Unconditional Image Generation
- **License:** MIT

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [GitHub Repo](https://github.com/mradovic38/sprite-flow)
- **Demo:** [Gradio App](https://huggingface.co/spaces/mradovic38/sprite-flow)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use
Predicts the vector field for generating 128x128 RGBA pixel art character images from Isotropic Gaussian Distribution by simulating an ODE with Linear Noise Scheduling.

### Out-of-Scope Use
Could be used with Cosine or any other Noise scheduler.

## How to Get Started with the Model
* Step 1 - **Clone the [GitHub Repo](https://github.com/mradovic38/sprite-flow)**

* Step 2 - **Initialize the model**:
  ```py
  from models.unet import PixelArtUNet
  
  model = PixelArtUNet(
      channels = [128, 256, 512, 1024],
      num_residual_layers = 2,
      t_embed_dim = 128,
      midcoder_dropout_p=0.2
  ).to(device)
  ```
  
* Step 3: **Load Model weights**:
  ```py
  from huggingface_hub import hf_hub_download
  from safetensors.torch import load_file
  
  repo_id = "mradovic38/sprite-flow"
  filename = "model.safetensors"
  file_path = hf_hub_download(repo_id=repo_id, filename=filename)
  checkpoint = load_file(file_path)
  model.load_state_dict(checkpoint)
  model.to(device)
  model.eval()
  ```

* Step 4: **Initialize the probability path**:
  ```py
  from sampling.conditional_probability_path import GaussianConditionalProbabilityPath
  from sampling.noise_scheduling import LinearAlpha, LinearBeta
  
  path = GaussianConditionalProbabilityPath(
      p_data=None,
      p_simple_shape=[4, 128, 128],
      alpha=LinearAlpha(),
      beta=LinearBeta()
  ).to(device)
  path.eval()
  ```

* Step 5: **Simulate ODE**:

  ```py
  import torch
  
  from diff_eq.ode_sde import UnguidedVectorFieldODE
  from diff_eq.simulator import EulerSimulator
  
  num_timesteps = 200 # example number of timesteps
  num_samples = 3 # example number of samples
  
  ts = torch.linspace(0, 1, num_timesteps).view(1, -1, 1, 1, 1).expand(num_samples, -1, 1, 1, 1).to(device)
  x0 = path.p_simple.sample(num_samples).to(device)  # (num_samples, 4, 128, 128)
  ode = UnguidedVectorFieldODE(model)
  simulator = EulerSimulator(ode)
  x1 = simulator.simulate(x0, ts)  # (num_samples, 4, 128, 128)
  ```

* Step 6: **Turn torch tensor to PIL**: 

  ```py
  from utils.helpers import tensor_to_rgba_image, normalize_to_unit
  
  x1 = normalize_to_unit(x1) # [-1, 1] -> [0, 1]
  imgs = tensor_to_rgba_image(x1)
  ```