Stable-Lime-v1.1 / README.md
FlameF0X's picture
Create README.md
d5f1a8f verified
metadata
license: mit
datasets:
  - FlameF0X/Lime
pipeline_tag: unconditional-image-generation
tags:
  - Lime

Stable-Lime-v1.1

Proiect nou (1)

Stable-Lime-v1.1 is an unconditional diffusion model based on the Denoising Diffusion Probabilistic Models (DDPM) architecture. It has been trained specifically to generate images representing the "essence of Lime."

Model Details

  • Model Type: Unconditional Image Generation (Diffusion)
  • Architecture: UNet2DModel with DDPMScheduler
  • Framework: PyTorch & Hugging Face Diffusers
  • Resolution: $128 \times 128$ pixels
  • Channels: 3 (RGB)
  • License: MIT (Assumed based on open-source usage)

Intended Use

This model is designed for:

  • Generating $128 \times 128$ images of limes (or lime-like textures).
  • Educational purposes regarding the implementation of DDPM loops.
  • Low-resolution, "retro" aesthetic generation.

Out of Scope:

  • Text-to-Image generation (this model does not accept text prompts).
  • High-resolution photorealism (limited by the 128px architecture).

Training Data

The model was trained on a proprietary dataset located at dataset_lime/processed.

  • Preprocessing: Images were resized to $128 \times 128$ and normalized to the range $[-1, 1]$.
  • Augmentation: Random horizontal flips were applied during training to improve generalization.

Training Procedure

Hyperparameters

The model was trained using the following configuration ("The Lime Settings"):

Parameter Value Description
Batch Size 16 Small batch size suitable for consumer GPUs.
Learning Rate $1 \times 10^{-4}$ Optimizer step size (AdamW).
Epochs 70
Timesteps 1000 Number of diffusion noise steps.
Image Size 128 Output resolution.

Architecture Specification

The U-Net architecture utilizes a deep structure with attention mechanisms in the lower bottleneck layers:

  • Block Output Channels: (128, 128, 256, 256, 512, 512)
  • Downsampling: 4x DownBlock2D, 1x AttnDownBlock2D, 1x DownBlock2D
  • Upsampling: Mirror of downsampling blocks.

Loss Function

The model optimizes the Mean Squared Error (MSE) between the actual noise added and the predicted noise:

L=MSE(ϵ,ϵθ(xt,t))L = \text{MSE}(\epsilon, \epsilon_\theta(x_t, t))

Where $\epsilon$ is the Gaussian noise and $\epsilon_\theta$ is the model's prediction at timestep $t$.