opendiffusionAI sdxlONE (RAW version V0.0)

What is this?

This is the base SDXL model.. but the CLIP-L text encoder swapped out with "LongCLIP".... and then the CLIP-G sub-model removed.

This is not sdxl-longcliponly

This is very similar to

https://huggingface.co/opendiffusionai/sdxl-longcliponly

However, this version has the text_encoder_2 and tokenizer models REMOVED.

On the one hand, this makes it smaller by a few gigs. (Note that this model is currently fp32 precision)

On the other hand, it requires a modified diffusers module to use, until my PR is accepted to the diffusers upstream code.

Why is this?

SDXL's largest limitations are primarily due to the lousy text CLIP(s) used. Not only are they of poor quality, but they have hidden token count limits, which make effective token count closer to 10. It is believed that one of the reasons CLIP-G was added on was to work around the limits of original CLIP-L. But.... that makes the model harder to train, and needlessly takes up more memory and time.

So, I created this version to experimentally prove the better way.

This allows use of up to 248 tokens with SDXL natively, without the layering hacks that some diffusion programs do.

To see the difference this can make, see the example given at

https://huggingface.co/opendiffusionai/sdxl-longcliponly

How to use

pip install torch
pip install  git+https://github.com/ppbrown/diffusers-sdxlone@sdxl-fix
python test-one.py

This should generate an image in "testimg.png" demonstrating that it actually works.


How this was made

I took sdxl-longcliponly and manually removed the unnecessary text models. I then also hand edited the model_config.json file

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for opendiffusionai/sdxlone

Finetuned
(1240)
this model