Spaces:
Runtime error
Runtime error
| title: Demo Stable Diffusion v1.4 | |
| emoji: 🤗 | |
| colorFrom: yellow | |
| colorTo: orange | |
| sdk: gradio | |
| app_file: gradio_app.py | |
| pinned: false | |
| # Task 1: Choosing model | |
| # Chosen model: Stable Diffusion text-to-image fine-tuning | |
| The `train_text_to_image.py` script shows how to fine-tune stable diffusion model on your own dataset. | |
| ### How to install the code requirements. | |
| First, clone the repo and then create a conda env from the env.yaml file and activate the env | |
| ```bash | |
| git clone https://github.com/hoangkimthuc/diffusers.git | |
| cd diffusers/examples/text_to_image | |
| conda env create -f env.yaml | |
| conda activate stable_diffusion | |
| ``` | |
| Before running the scripts, make sure to install the library's training dependencies: | |
| **Important** | |
| To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: | |
| ```bash | |
| cd diffusers | |
| pip install . | |
| ``` | |
| Then cd in the diffusers/examples/text_to_image folder and run | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: | |
| ```bash | |
| accelerate config | |
| ``` | |
| ### Steps to run the training. | |
| You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. | |
| You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). | |
| Run the following command to authenticate your token | |
| ```bash | |
| huggingface-cli login | |
| ``` | |
| If you have already cloned the repo, then you won't need to go through these steps. | |
| <br> | |
| #### Hardware | |
| With `gradient_checkpointing` and `mixed_precision` it should be possible to fine tune the model on a single 24GB GPU. For higher `batch_size` and faster training it's better to use GPUs with >30GB memory. | |
| **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** | |
| ```bash | |
| bash train.sh | |
| ``` | |
| ### Sample input/output after training | |
| Once the training is finished the model will be saved in the `output_dir` specified in the command. In this example it's `sd-pokemon-model`. To load the fine-tuned model for inference just pass that path to `StableDiffusionPipeline` | |
| ```python | |
| from diffusers import StableDiffusionPipeline | |
| model_path = "sd-pokemon-model" | |
| pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16) | |
| pipe.to("cuda") | |
| image = pipe(prompt="yoda").images[0] | |
| image.save("yoda-pokemon.png") | |
| ``` | |
| The output with the prompt "yoda" is saved in the `yoda-pokemon.png` image file. | |
| ### Name and link to the training dataset. | |
| Dataset name: pokemon-blip-captions | |
| Dataset link: https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions | |
| ### The number of model parameters to determine the model’s complexity. | |
| Note: CLIPTextModel (text conditioning model) and AutoencoderKL (image generating model) are frozen, only the Unet (the diffusion model) is trained. | |
| The number of trainable parameters in the script: 859_520_964 | |
| To get this number, you can put a breakpoint by calling `breakpoint()` at line 813 of the `train_text_to_image.py` file and then run `train.sh`. Once the pbd session stops at that line, you can check the model's parameters by `p unet.num_parameters()`. | |
| ### The model evaluation metric (CLIP score) | |
| CLIP score is a measure of how well the generated images match the prompts. | |
| Validation prompts to calculate the CLIP scores: | |
| ```python | |
| prompts = [ | |
| "a photo of an astronaut riding a horse on mars", | |
| "A high tech solarpunk utopia in the Amazon rainforest", | |
| "A pikachu fine dining with a view to the Eiffel Tower", | |
| "A mecha robot in a favela in expressionist style", | |
| "an insect robot preparing a delicious meal", | |
| "A small cabin on top of a snowy mountain in the style of Disney, artstation", | |
| ] | |
| ``` | |
| To calculate the CLIP score for the above prompts, run: | |
| ```bash | |
| python metrics.py | |
| ``` | |
| ### Link to the trained model | |
| https://drive.google.com/file/d/1xzVUO0nZn-0oaJgHOWjrYKHmGUlsoJ1g/view?usp=sharing | |
| ### Modifications made to the original code | |
| - Add metrics and gradio_app scripts | |
| - Remove redundunt code | |
| - Add training bash script | |
| - Improve readme | |
| - Add conda env.yaml file and add more dependencies for the web app | |
| # Task 2: Using the model in a web application | |
| To create | |