Garment UV-Texture ControlNet (v3)

A ControlNet for Stable Diffusion XL that generates UV-space texture atlases for 3D garment meshes, conditioned on tangent-space normal maps baked into UV space.

Given a UV-space normal map of a garment mesh and a text prompt describing the material/pattern, this ControlNet produces a flat 2D texture atlas with the garment panels correctly placed for the mesh's UV layout. The atlas can then be applied as a texture to the 3D mesh.

Categories trained on

Category Samples
long-shirt ~383
long-dress ~413
short-shirt ~236
shorts ~74
pants ~38
Total ~1144

Training details

  • Base: stabilityai/stable-diffusion-xl-base-1.0
  • VAE: madebyollin/sdxl-vae-fp16-fix
  • Resolution: 1024×1024
  • Steps: 20000 (warm-started from a 12000-step single-category checkpoint)
  • Batch size: 2
  • Learning rate: 1e-5, cosine schedule, 500 warmup steps
  • Mixed precision: fp16
  • Loss masking: per-pixel weighted MSE with UV-island mask (background weight 0.1)
  • Captions: per-sample, generated with Gemma 3 27B vision and trimmed to fit the 77-token CLIP limit

Usage

import torch
from diffusers import (
    AutoencoderKL, ControlNetModel,
    StableDiffusionXLControlNetPipeline, UniPCMultistepScheduler,
)
from PIL import Image

controlnet = ControlNetModel.from_pretrained(
    "JorgeAskur/garment-uv-controlnet-v3", torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet, vae=vae,
    torch_dtype=torch.float16,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

normal_map = Image.open("normal.png").convert("RGB").resize((1024, 1024))
atlas = pipe(
    prompt="long-sleeved plaid shirt, cotton, red and cream checkered pattern",
    image=normal_map,
    num_inference_steps=40,
    guidance_scale=7.5,
    controlnet_conditioning_scale=1.0,
    height=1024, width=1024,
).images[0]
atlas.save("atlas.png")

Conditioning input

The conditioning image is a UV-space tangent normal map: render your mesh in UV space (UV coordinates as 2D positions) and encode the per-fragment surface normal as RGB: R = (N.x * 0.5 + 0.5) * 255, same for G/B. Background should be black (0, 0, 0).

Limitations

  • Trained on registered/fitted garment meshes — works best on similar topology.
  • Five garment categories only; out-of-distribution garments (e.g. jackets, hats) will produce poor results.
  • Captions should follow the training distribution: a single comma-separated line describing material, pattern, color, and notable details. Avoid 3D-photo wording.

License

OpenRAIL++ (inherits from SDXL base).

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JorgeAskur/garment-uv-controlnet-v3

Adapter
(8367)
this model