| --- |
| tags: |
| - model_hub_mixin |
| - pytorch_model_hub_mixin |
| - art |
| license: mit |
| pipeline_tag: unconditional-image-generation |
| metrics: |
| - name: FID |
| type: image |
| value: 80.4755 |
| dataset: https://www.kaggle.com/datasets/ayhantasyurt/pixel-art-2dgame-charecter-sprites-idle |
| split: test |
| --- |
| # Sprite-flow |
| Flow-based generative model for unguided generation of 128x128 RGBA pixel art characters. |
|
|
| ## Model Details |
| ### Model Description |
| - **Developed by:** [Mihailo Radović](https://www.linkedin.com/in/mihailo-radović-484070278/) |
| - **Model type:** Unconditional Image Generation |
| - **License:** MIT |
|
|
| ### Model Sources |
|
|
| <!-- Provide the basic links for the model. --> |
|
|
| - **Repository:** [GitHub Repo](https://github.com/mradovic38/sprite-flow) |
| - **Demo:** [Gradio App](https://huggingface.co/spaces/mradovic38/sprite-flow) |
|
|
| ## Uses |
|
|
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
| ### Direct Use |
| Predicts the vector field for generating 128x128 RGBA pixel art character images from Isotropic Gaussian Distribution by simulating an ODE with Linear Noise Scheduling. |
|
|
| ### Out-of-Scope Use |
| Could be used with Cosine or any other Noise scheduler. |
|
|
| ## How to Get Started with the Model |
| * Step 1 - **Clone the [GitHub Repo](https://github.com/mradovic38/sprite-flow)** |
|
|
| * Step 2 - **Initialize the model**: |
| ```py |
| from models.unet import PixelArtUNet |
| |
| model = PixelArtUNet( |
| channels = [128, 256, 512, 1024], |
| num_residual_layers = 2, |
| t_embed_dim = 128, |
| midcoder_dropout_p=0.2 |
| ).to(device) |
| ``` |
| |
| * Step 3: **Load Model weights**: |
| ```py |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_file |
| |
| repo_id = "mradovic38/sprite-flow" |
| filename = "model.safetensors" |
| file_path = hf_hub_download(repo_id=repo_id, filename=filename) |
| checkpoint = load_file(file_path) |
| model.load_state_dict(checkpoint) |
| model.to(device) |
| model.eval() |
| ``` |
|
|
| * Step 4: **Initialize the probability path**: |
| ```py |
| from sampling.conditional_probability_path import GaussianConditionalProbabilityPath |
| from sampling.noise_scheduling import LinearAlpha, LinearBeta |
| |
| path = GaussianConditionalProbabilityPath( |
| p_data=None, |
| p_simple_shape=[4, 128, 128], |
| alpha=LinearAlpha(), |
| beta=LinearBeta() |
| ).to(device) |
| path.eval() |
| ``` |
|
|
| * Step 5: **Simulate ODE**: |
|
|
| ```py |
| import torch |
| |
| from diff_eq.ode_sde import UnguidedVectorFieldODE |
| from diff_eq.simulator import EulerSimulator |
| |
| num_timesteps = 200 # example number of timesteps |
| num_samples = 3 # example number of samples |
| |
| ts = torch.linspace(0, 1, num_timesteps).view(1, -1, 1, 1, 1).expand(num_samples, -1, 1, 1, 1).to(device) |
| x0 = path.p_simple.sample(num_samples).to(device) # (num_samples, 4, 128, 128) |
| ode = UnguidedVectorFieldODE(model) |
| simulator = EulerSimulator(ode) |
| x1 = simulator.simulate(x0, ts) # (num_samples, 4, 128, 128) |
| ``` |
|
|
| * Step 6: **Turn torch tensor to PIL**: |
|
|
| ```py |
| from utils.helpers import tensor_to_rgba_image, normalize_to_unit |
| |
| x1 = normalize_to_unit(x1) # [-1, 1] -> [0, 1] |
| imgs = tensor_to_rgba_image(x1) |
| ``` |