| <!--Copyright 2025 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # Textual Inversion | |
| [Textual Inversion](https://huggingface.co/papers/2208.01618) is a method for generating personalized images of a concept. It works by fine-tuning a models word embeddings on 3-5 images of the concept (for example, pixel art) that is associated with a unique token (`<sks>`). This allows you to use the `<sks>` token in your prompt to trigger the model to generate pixel art images. | |
| Textual Inversion weights are very lightweight and typically only a few KBs because they're only word embeddings. However, this also means the word embeddings need to be loaded after loading a model with [`~DiffusionPipeline.from_pretrained`]. | |
| ```py | |
| import torch | |
| from diffusers import AutoPipelineForText2Image | |
| pipeline = AutoPipelineForText2Image.from_pretrained( | |
| "stable-diffusion-v1-5/stable-diffusion-v1-5", | |
| torch_dtype=torch.float16 | |
| ).to("cuda") | |
| ``` | |
| Load the word embeddings with [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`] and include the unique token in the prompt to activate its generation. | |
| ```py | |
| pipeline.load_textual_inversion("sd-concepts-library/gta5-artwork") | |
| prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style" | |
| pipeline(prompt).images[0] | |
| ``` | |
| <div class="flex justify-center"> | |
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_txt_embed.png" /> | |
| </div> | |
| Textual Inversion can also be trained to learn *negative embeddings* to steer generation away from unwanted characteristics such as "blurry" or "ugly". It is useful for improving image quality. | |
| EasyNegative is a widely used negative embedding that contains multiple learned negative concepts. Load the negative embeddings and specify the file name and token associated with the negative embeddings. Pass the token to `negative_prompt` in your pipeline to activate it. | |
| ```py | |
| import torch | |
| from diffusers import AutoPipelineForText2Image | |
| pipeline = AutoPipelineForText2Image.from_pretrained( | |
| "stable-diffusion-v1-5/stable-diffusion-v1-5", | |
| torch_dtype=torch.float16 | |
| ).to("cuda") | |
| pipeline.load_textual_inversion( | |
| "EvilEngine/easynegative", | |
| weight_name="easynegative.safetensors", | |
| token="easynegative" | |
| ) | |
| prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration" | |
| negative_prompt = "easynegative" | |
| pipeline(prompt, negative_prompt).images[0] | |
| ``` | |
| <div class="flex justify-center"> | |
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png" /> | |
| </div> |