| # Virtual Try-On Diffusion API |
|
|
| <!-- TOC --> |
| * [Virtual Try-On Diffusion API](#virtual-try-on-diffusion-api) |
| * [Summary](#summary) |
| * [Consuming the API](#consuming-the-api) |
| * [Try-On Endpoints](#try-on-endpoints) |
| * [Try-On Input Parameters](#try-on-input-parameters) |
| * [Clothing image](#clothing-image) |
| * [Clothing prompt](#clothing-prompt) |
| * [Avatar image](#avatar-image) |
| * [Avatar prompt](#avatar-prompt) |
| * [Background image](#background-image) |
| * [Background prompt](#background-prompt) |
| * [Additional notes](#additional-notes) |
| * [Try-On Output](#try-on-output) |
| * [Response codes](#response-codes) |
| * [NSFW content](#nsfw-content) |
| * [Use Cases and Recipes](#use-cases-and-recipes) |
| * [Image-based virtual try-on](#image-based-virtual-try-on) |
| * [Image-based virtual try-on with background](#image-based-virtual-try-on-with-background) |
| * [Avatar from a text prompt](#avatar-from-a-text-prompt) |
| * [Creating diverse product images](#creating-diverse-product-images) |
| * [Clothing from a text prompt](#clothing-from-a-text-prompt) |
| * [Modifying clothing](#modifying-clothing) |
| * [Modifying avatar's body](#modifying-avatars-body) |
| * [Txt2Img](#txt2img) |
| * [Other creative possibilities](#other-creative-possibilities) |
| * [Performance](#performance) |
| * [Known Issues and Limitations](#known-issues-and-limitations) |
| * [Changelog](#changelog) |
| <!-- TOC --> |
|
|
| ## Summary |
|
|
| Virtual Try-On Diffusion [VTON-D] by [Texel.Moda](https://texelmoda.com) is a custom diffusion-based pipeline for fast |
| and flexible multi-modal virtual try-on. Clothing, avatar and background can be specified by reference images or text |
| prompts allowing for clothing transfer, avatar replacement, fashion image generation and other virtual try-on related |
| tasks. Check out the [demo on Hugging Face](https://huggingface.co/spaces/texelmoda/try-on-diffusion) to try the API in |
| a user-friendly way. |
|
|
| ## Consuming the API |
|
|
| The API is exposed through the RapidAPI Hub which manages API subscriptions, API keys, payments and other things. Please |
| refer to the [RapidAPI Documentation](https://docs.rapidapi.com/docs/consumer-quick-start-guide) to get started. |
|
|
| Generally, in order to use the API you need to perform the following steps: |
| - Create a RapidAPI.com account. |
| - [Navigate to the API page](https://rapidapi.com/texelmoda-texelmoda-apis/api/try-on-diffusion) and subscribe to a |
| suitable pricing plan. We also provide a free BASIC plan with 100 API requests per month. |
| - Use the obtained RapidAPI key to authenticate (via the _X-RapidAPI-Key_ header) and use the API from any programming |
| language or tool you like. |
|
|
| Example API call using cURL: |
| ```shell |
| curl --request POST \ |
| --url https://try-on-diffusion.p.rapidapi.com/try-on-file \ |
| --header 'Content-Type: multipart/form-data' \ |
| --header 'x-rapidapi-host: try-on-diffusion.p.rapidapi.com' \ |
| --header 'x-rapidapi-key: <RapidAPI Key>' \ |
| --form clothing_image=1.jpg \ |
| --form avatar_image=2.jpg |
| ``` |
|
|
| For a simple Python client implementation please see the |
| [Hugging Face demo application source](https://huggingface.co/spaces/texelmoda/try-on-diffusion/blob/main/try_on_diffusion_client.py). |
|
|
| ## Try-On Endpoints |
|
|
| Try-On API consists of two endpoints that differ only in the method of passing reference images: |
|
|
| - **POST** _/try-on-file_ - takes reference images as uploaded files in the request body (using multipart/form-data). |
|
|
|
|
| - **POST** _/try-on-url_ - takes reference images as image URLs in POST parameters. |
|
|
| All image requirements, behavior and status codes are the same for both endpoints, choose the one that best suits your |
| application architecture. |
|
|
| ## Try-On Input Parameters |
|
|
| All input parameters for the try-on endpoints are currently optional. Images and prompts serve as additional generation |
| conditions and can even be used in combination. Below is the short parameter summary with links to extended information |
| on certain parameters. |
|
|
| List of input parameters for the **POST** _/try-on-file_ endpoint: |
|
|
| | Parameter | Description | Required | |
| |-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| |
| | [clothing_image](#clothing-image) | Clothing reference image in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
| | [clothing_prompt](#clothing-prompt) | Text prompt for clothing, can be used instead of an image. Compel weighting syntax is supported. Example: _red sleeveless mini dress_ | No | |
| | [avatar_image](#avatar-image) | Avatar image in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
| | avatar_sex | Avatar sex, either "male" or "female". Will be detected automatically, if left empty or omitted. Will enforce certain avatar sex if specified. | No | |
| | [avatar_prompt](#avatar-prompt) | Text prompt for the avatar, can be used instead of an image or with image to modify the avatar. Compel weighting syntax is supported. Example: _a gentleman with beard and mustache_ | No | |
| | [background_image](#background-image) | Optional background reference image in JPEG, PNG or WEBP format, maximum file size is 12 MB. Original avatar background is preserved if background is not specified. | No | |
| | [background_prompt](#background-prompt) | Optional background text prompt. Original avatar background is preserved if background is not specified. Example: _in an autumn park_ | No | |
| | seed | Seed for image generation. Default is -1 (random seed). Actual seed will also be output in the "X-Seed" response header. Example: _42_ | No | |
|
|
| List of input parameters for the **POST** _/try-on-url_ endpoint: |
|
|
| | Parameter | Description | Required | |
| |-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| |
| | [clothing_image_url](#clothing-image) | Clothing reference image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
| | [clothing_prompt](#clothing-prompt) | Text prompt for clothing, can be used instead of an image. Compel weighting syntax is supported. Example: _red sleeveless mini dress_ | No | |
| | [avatar_image_url](#avatar-image) | Avatar image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. | No | |
| | avatar_sex | Avatar sex, either "male" or "female". Will be detected automatically, if left empty or omitted. Will enforce certain avatar sex if specified. | No | |
| | [avatar_prompt](#avatar-prompt) | Text prompt for the avatar, can be used instead of an image or with image to modify the avatar. Compel weighting syntax is supported. Example: _a gentleman with beard and mustache_ | No | |
| | [background_image_url](#background-image) | Optional background reference image URL. Image should be in JPEG, PNG or WEBP format, maximum file size is 12 MB. Original avatar background is preserved if background is not specified. | No | |
| | [background_prompt](#background-prompt) | Optional background text prompt. Original avatar background is preserved if background is not specified. Example: _in an autumn park_ | No | |
| | seed | Seed for image generation. Default is -1 (random seed). Actual seed will also be output in the "X-Seed" response header. Example: _42_ | No | |
|
|
| ### Clothing image |
|
|
| For best results clothing reference images should meet a number of requirements: |
|
|
| - File format: **JPEG**, **PNG** or **WEBP** |
| - Maximum file size: **12 MB** |
| - Minimum image size: **256x256** |
| - Recommended image size: **768x1024 and above** |
| - For best results clothing should be **dressed on a person** or **on a ghost mannequin**. Some flat lay clothing photos might work too, but currently it's not guaranteed. |
| - **Single person** on the image (though multiple persons might also work) |
| - **Frontal** photo, though some degree of rotation is fine |
| - **Good lighting** conditions and **high image quality** as it directly affects the result |
| - **Minimal occlusion** by hair, hands or accessories |
|
|
| To summarize: the better is the clothing image the better is the final result. |
|
|
| Examples of good clothing images: |
|
|
| | <img src="images/clothing_image_01.jpg" width="240"> | <img src="images/clothing_image_02.jpg" width="240"> | <img src="images/clothing_image_03.jpg" width="240"> | <img src="images/clothing_image_04.jpg" width="240"> | <img src="images/clothing_image_05.jpg" width="240"> | <img src="images/clothing_image_06.jpg" width="240"> | |
| |------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|------------------------------------------------------| |
|
|
| ### Clothing prompt |
|
|
| Instead of a clothing image you can use text prompt to describe the garment. Short and clear prompts work best. |
| Additionally, [Compel weighting syntax](https://github.com/damian0815/compel/blob/main/doc/syntax.md) is supported to |
| increase or decrease weight of certain tokens. Examples: |
| - _a sheer blue sleeveless mini dress_ |
| - _a beige woolen sweater and white pleated skirt_ |
| - _a black leather jacket and dark blue slim-fit jeans_ |
| - _a floral pattern blouse and leggings_ |
| - _a colorful+++ t-shirt and black shorts_ |
|
|
| ### Avatar image |
|
|
| Avatar images should also meet a some requirements: |
|
|
| - File format: **JPEG**, **PNG** or **WEBP** |
| - Maximum file size: **12 MB** |
| - Minimum image size: **256x256** |
| - Recommended image size: **768x1024 and above** |
| - **Single person** on the image (though multiple persons might also work) |
| - **Frontal** photo, though some degree of rotation is fine |
| - **Good lighting** conditions and **high image quality** |
|
|
| Examples of good avatar images: |
|
|
| | <img src="images/avatar_image_01.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/avatar_image_04.jpg" width="240"> | |
| |----------------------------------------------------|----------------------------------------------------|----------------------------------------------------|----------------------------------------------------| |
|
|
| ### Avatar prompt |
|
|
| Instead of an avatar image you can use text prompt to describe the person. Short and clear prompts work best. |
| Additionally, [Compel weighting syntax](https://github.com/damian0815/compel/blob/main/doc/syntax.md) is supported to |
| increase or decrease weight of certain tokens. Examples: |
| - _a beautiful blond girl with long hair_ |
| - _a cute redhead girl with freckles_ |
| - _a (plus size)++ female model wearing sunglasses_ |
| - _a fit man with dark beard and blue eyes_ |
| - _a gentleman with beard and mustache_ |
|
|
| ### Background image |
|
|
| Background images are used to extract high-level background features only and serve as a reference (and not exact |
| background). Below are basic image requirements: |
|
|
| - File format: **JPEG**, **PNG** or **WEBP** |
| - Maximum file size: **12 MB** |
| - Recommended image size: **256x256 and above** |
|
|
| Examples of background images: |
|
|
| | <img src="images/background_image_01.jpg" width="240"> | <img src="images/background_image_02.jpg" width="240"> | <img src="images/background_image_03.jpg" width="240"> | <img src="images/background_image_04.jpg" width="240"> | |
| |--------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------| |
|
|
| ### Background prompt |
|
|
| Instead of a background image you can use text prompt to describe the background. Short and clear prompts work best. |
| Additionally, [Compel weighting syntax](https://github.com/damian0815/compel/blob/main/doc/syntax.md) is supported to |
| increase or decrease weight of certain tokens. Examples: |
| - _in an autumn park_ |
| - _in front of a brick wall_ |
| - _on an ocean beach with (palm trees)++_ |
| - _in a shopping mall_ |
| - _in a modern office_ |
|
|
| ### Additional notes |
|
|
| We use the "same-crop" approach for clothing and avatar images: images will be cropped roughly the same way (using pose |
| estimation), so we don't have to add too much new information (e.g. assume lower body clothing). So, if you use only a |
| photo of an upper body clothing the result will also be cropped the same way regardless of the avatar image (and the |
| other way around): |
|
|
| | Clothing Image | Avatar Image | Result Image | |
| |------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------| |
| | <img src="images/clothing_image_02.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/same_crop_result_01.jpg" width="240"> | |
| | <img src="images/clothing_image_03.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/same_crop_result_02.jpg" width="240"> | |
|
|
| ## Try-On Output |
|
|
| ### Response codes |
|
|
| HTTP status code is used as a high-level response status. In case of a successful API call HTTP code 200 will be |
| returned and response body will contain a resulting JPEG image with the maximum size of 768x1024 pixels. Response |
| will also have the "X-Seed" header set that should contain the actual seed used for image generation (for |
| reproducibility). Other status codes (not 200) indicate unsuccessful request, see the table below for additional |
| details: |
|
|
| | Response Code | Content-Type | Headers | Description | Example | |
| |:-------------:|:------------------:|:--------------:|-----------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------:| |
| | **200** | image/jpeg | X-Seed: {seed} | Successful API call. Response body contains the resulting image in JPEG format. | <img src="images/same_crop_result_01.jpg" width="160"> | |
| | **400** | application/json | | Bad request: at least one of request parameters is invalid. Response body should contain additional error details in JSON format. | { "detail": "Invalid upload file type: application/x-zip-compressed" } | |
| | **403** | application/json | | Indicates authentication issue (e.g. invalid API key). | | |
| | **422** | application/json | | Request validation error. Response body should contain error details in JSON format. | { "detail": [ { "loc": [ "string", 0], "msg": "string", "type": "string" } ] } | |
| | **429** | | | Too many requests. Might be triggered by the RapidAPI proxy in case of reaching maximum request rate or API call limit. | | |
| | **500** | | | Indicates an internal server error, might not have any details. | | |
|
|
| ### NSFW content |
|
|
| We use NSFW content checker to ensure we don't output inappropriate images. If potential NSFW content is detected in the |
| generated image, the API will return HTTP status code 400 with a corresponding error message in JSON response. |
|
|
| ## Use Cases and Recipes |
|
|
| Our Virtual Try-On API offers a flexible way to specify clothing, avatar and background, which makes it possible to not |
| only perform a classic task of virtual try-on, but also generate entirely new images or alter existing images in some |
| interesting aspects. Feel free to try and explore! |
|
|
| In all the examples below all unmentioned inputs are assumed to be empty. |
|
|
| ### Image-based virtual try-on |
|
|
| The most common use case is to transfer clothing from one photo (e.g. from a product page) to another photo (e.g. |
| user avatar) while maintaining the avatar and the background. |
|
|
| | Clothing Image | Avatar Image | Result Image | |
| |------------------------------------------------------|----------------------------------------------------|----------------------------------------------------------| |
| | <img src="images/clothing_image_01.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/image_based_result_01.jpg" width="240"> | |
| | <img src="images/clothing_image_05.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/image_based_result_02.jpg" width="240"> | |
|
|
| ### Image-based virtual try-on with background |
|
|
| Additionally, it's possible to replace the avatar background with a reference image or a text prompt. |
|
|
| | Clothing Image | Avatar Image | Background Image | Result Image | |
| |------------------------------------------------------|----------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------| |
| | <img src="images/clothing_image_04.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/background_image_01.jpg" width="240"> | <img src="images/image_based_background_result_01.jpg" width="240"> | |
|
|
| And with a text prompt for the background: |
|
|
| | Clothing Image | Avatar Image | Background Prompt | Result Image | |
| |------------------------------------------------------|----------------------------------------------------|------------------------------|---------------------------------------------------------------------| |
| | <img src="images/clothing_image_04.jpg" width="240"> | <img src="images/avatar_image_03.jpg" width="240"> | in front of a snowy mountain | <img src="images/image_based_background_result_02.jpg" width="240"> | |
|
|
| ### Avatar from a text prompt |
|
|
| It's possible to replace the person on the clothing image with an avatar, described in a text prompt. Background will be |
| changed as well and will be a random one if not specified: |
|
|
| | Clothing Image | Avatar Prompt | Background Prompt | Result Image | |
| |------------------------------------------------------|--------------------------------------------|--------------------|------------------------------------------------------------| |
| | <img src="images/clothing_image_02.jpg" width="240"> | a beautiful blond girl with long hair | | <img src="images/avatar_prompt_result_01.jpg" width="240"> | |
| | <img src="images/clothing_image_03.jpg" width="240"> | a gentleman with a long beard and mustache | near a fireplace | <img src="images/avatar_prompt_result_02.jpg" width="240"> | |
|
|
| You may also experiment with avatar prompts for more interesting results: |
|
|
| | Clothing Image | Avatar Prompt | Background Prompt | Result Image | |
| |------------------------------------------------------|---------------------|-----------------------|------------------------------------------------------------| |
| | <img src="images/clothing_image_03.jpg" width="240"> | (iron man mask)+++ | in the Sahara Desert | <img src="images/avatar_prompt_result_03.jpg" width="240"> | |
|
|
| ### Creating diverse product images |
|
|
| If you have a clothing image on a ghost mannequin (flat lay photo might work too), you can generate product images with |
| avatars and backgrounds of your choice: |
|
|
| | Clothing Image | Avatar Prompt | Background Image | Result Image | |
| |------------------------------------------------------|---------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------| |
| | <img src="images/clothing_image_05.jpg" width="240"> | a beautiful blond girl with long hair | <img src="images/background_image_02.jpg" width="240"> | <img src="images/clothing_avatar_prompt_result_01.jpg" width="240"> | |
| | <img src="images/clothing_image_06.jpg" width="240"> | a gentleman with beard and mustache | <img src="images/background_image_04.jpg" width="240"> | <img src="images/clothing_avatar_prompt_result_02.jpg" width="240"> | |
|
|
| ### Clothing from a text prompt |
|
|
| Similarly, you can specify clothing with a text prompt while providing an avatar image: |
|
|
| | Clothing Prompt | Avatar Image | Result Image | |
| |-------------------------------------|----------------------------------------------------|--------------------------------------------------------------| |
| | a sheer blue sleeveless mini dress | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/clothing_prompt_result_01.jpg" width="240"> | |
| | a colorful t-shirt and black shorts | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/clothing_prompt_result_02.jpg" width="240"> | |
|
|
| ### Modifying clothing |
|
|
| It's possible to modify clothing to some extent using a clothing image and a clothing prompt simultaneously: |
|
|
| | Clothing Image | Clothing prompt | Avatar Image | Result Image | |
| |------------------------------------------------------|-------------------|----------------------------------------------------|--------------------------------------------------------------------| |
| | <img src="images/clothing_image_06.jpg" width="240"> | (long sleeves)+++ | <img src="images/avatar_image_03.jpg" width="240"> | <img src="images/clothing_modification_result_01.jpg" width="240"> | |
| | <img src="images/clothing_image_03.jpg" width="240"> | shorts+++ | <img src="images/avatar_image_04.jpg" width="240"> | <img src="images/clothing_modification_result_02.jpg" width="240"> | |
|
|
| ### Modifying avatar's body |
|
|
| If you specify clothing and avatar images to be the same while providing an avatar prompt it's possible to change |
| avatar's body proportions. Note that it may require using additional term weighting to achieve stronger changes. |
|
|
| | Clothing Image | Avatar Image | Avatar Prompt | Result Image | |
| |------------------------------------------------------|------------------------------------------------------|-------------------------------|------------------------------------------------------------------| |
| | <img src="images/clothing_image_01.jpg" width="240"> | <img src="images/clothing_image_01.jpg" width="240"> | a (plus size)+ woman | <img src="images/avatar_modification_result_01.jpg" width="240"> | |
| | <img src="images/clothing_image_03.jpg" width="240"> | <img src="images/clothing_image_03.jpg" width="240"> | a (muscular bodybuilder)+++++ | <img src="images/avatar_modification_result_02.jpg" width="240"> | |
|
|
| ### Txt2Img |
|
|
| As our diffusion model was fine-tuned to produce people wearing various clothing, it can better follow a clothing prompt |
| and output realistic people and garments: |
|
|
| | Clothing Prompt | Avatar Prompt | Background Prompt | Result Image | |
| |-------------------------------------------------|--------------------------------|------------------------|------------------------------------------------------| |
| | a paisley pattern purple shirt and beige chinos | a fit man with dark beard | plain white background | <img src="images/txt2img_result_01.jpg" width="240"> | |
| | a white polka dot pattern dress | a beautiful petite blond woman | on a yacht | <img src="images/txt2img_result_02.jpg" width="240"> | |
|
|
| ### Other creative possibilities |
|
|
| If you specify the same image for clothing and avatar while providing a background prompt (or background image) you can |
| replace the background in a creative way: |
|
|
| | Clothing Image | Avatar Image | Background Prompt | Result Image | |
| |----------------------------------------------------|----------------------------------------------------|-------------------------|-------------------------------------------------------------| |
| | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/avatar_image_02.jpg" width="240"> | on a snowy mountain top | <img src="images/new_background_result_01.jpg" width="240"> | |
|
|
| It's also possible to use a combination of clothing image, clothing prompt, avatar image and a background to add some |
| accessories: |
|
|
| | Clothing Image | Clothing Prompt | Avatar Image | Background Image | Result Image | |
| |------------------------------------------------------|--------------------------|------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------| |
| | <img src="images/avatar_image_02.jpg" width="240"> | a (light brown purse)+++ | <img src="images/avatar_image_02.jpg" width="240"> | <img src="images/background_image_03.jpg" width="240"> | <img src="images/accessory_result_01.jpg" width="240"> | |
|
|
| ## Performance |
|
|
| Typically, one try-on request is processed in 5-10 seconds (depending on type of conditions) excluding network latency. |
| In order to reduce network overhead you might want to compress your images before feeding to the API (e.g. using JPEG). |
| Please note that in case of a high demand processing time might increase due to request being queued, though we |
| constantly monitor our GPU cluster capacity and perform scaling as needed. |
|
|
| ## Known Issues and Limitations |
|
|
| As any generative model, our models are not perfect (though we constantly work on improvements): |
| - Currently, we do not fully support flat lay clothing images. Some might work, but that's not guaranteed. |
| - Prompt following might not be perfect, especially in case of long and sophisticated prompts. Prefer simpler and more |
| straightforward prompts whenever possible. Also be pretty verbose (e.g. use the word "plain" if you need something of |
| solid color). Additionally, Compel weighting might be used to increase weight of certain tokens. |
| - As usual, generative models struggle with hands, fingers and toes, though we try to mitigate it to a certain extent. |
| - Currently, we do not support trying on a single garment, only the full look. |
| - Hats and sunglasses are not currently transferred, but we are working on it. |
| - Backgrounds might lack some clarity as currently we focus more on clothing. |
| - In case of a specified background a hairstyle might slightly change. |
| - Body shape of the avatar might change towards smaller sizes. |
|
|
| ## Changelog |
|
|
| The changelog below contains major API updates focusing on new features and other improvements. |
|
|
| - **2024-12-15**: New API release brings support for clothing on ghost mannequins and (partially) flat lay clothing |
| photos. |
|
|
| - **2024-11-07**: Initial public API release. |