dixisouls
/

anime-diffusion

stable-diffusion

Model card Files Files and versions

dixisouls commited on Mar 1

Commit

46bbc9b

·

1 Parent(s): d23ccdf

docs: update README

Files changed (1) hide show

README.md +67 -1

README.md CHANGED Viewed

@@ -9,4 +9,70 @@ tags:
 - diffusers
 - stable-diffusion
 - text-to-image
----

 - diffusers
 - stable-diffusion
 - text-to-image
+---
+# Anime-Diffusion UNet
+A UNet2DConditionModel fine-tuned for anime-style image generation, based on Stable Diffusion v1.4.
+## Model Details
+- **Architecture:** UNet2DConditionModel from [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)
+- **EMA Decay:** 0.9995
+- **Output Resolution:** 512×512
+- **Prediction Type:** epsilon
+### Companion Models (required for inference)
+| Component | Model ID |
+|-----------|----------|
+| VAE | [stabilityai/sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse) |
+| Text Encoder | [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) |
+| Tokenizer | [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) |
+## Training Details
+- **Dataset:** [none-yet/anime-captions](https://huggingface.co/datasets/none-yet/anime-captions) (~337k image-caption pairs)
+- **Steps:** 10,000
+- **Batch Size:** 128 (32 per GPU × 4 GPUs)
+- **Learning Rate:** 1e-4 with cosine schedule (500 warmup steps)
+- **Optimizer:** AdamW (weight decay 0.01)
+- **Mixed Precision:** fp16
+- **Noise Schedule:** DDPM, 1000 linear timesteps
+- **Gradient Clipping:** 1.0
+## Usage
+```python
+import torch
+from diffusers import AutoencoderKL, DDIMScheduler, UNet2DConditionModel
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+from transformers import CLIPTextModel, CLIPTokenizer
+# Load models
+tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
+text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
+vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
+unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet")
+# Load fine-tuned EMA weights
+weights_path = hf_hub_download(repo_id="dixisouls/anime-diffusion", filename="model.safetensors")
+unet.load_state_dict(load_file(weights_path))
+# Use DDIMScheduler for inference
+scheduler = DDIMScheduler(
+    num_train_timesteps=1000,
+    beta_schedule="linear",
+    clip_sample=False,
+    prediction_type="epsilon",
+)
+```
+See the companion [HuggingFace Space](https://huggingface.co/spaces/dixisouls/stable-anime) for a full interactive demo.
+## Limitations
+- Trained exclusively on anime-style images; not suitable for photorealistic generation
+- Fixed output resolution of 512×512
+- Single-subject prompts work best; complex multi-character scenes may be inconsistent