want to know the difference

#27

by Shreyash113 - opened Mar 10

Mar 10

What is the difference between scaled and input scaled kijai, Thanks for your contributions to the open source team love u.

Shreyash113 changed discussion status to closed Mar 10

Shreyash113 changed discussion status to open Mar 10

Kijai

Owner Mar 11

It is mentioned in the readme.

realrebelai

Mar 11

Pro tip:

Simply ask AI the difference with the huggingface link! This helps me so much when i want to know something simple without bothering the contributor 🫶

anr2me

Mar 11

•

edited Mar 11

The input_scaled one said to be more than 50% faster on RTX 40 series or newer.

But the v2 version said to have terrible quality according to someone test at reddit.

Rumaben

Mar 12

•

edited Mar 12

Do you need to select 'fp8_e4m3fn_fast' in the 'Load Diffusion Model' node for those input_scaled models to work? In my own tests input_scaled (v1.) were slower than the normal fp8 model. V2. were around the same speed as the normal fp8 but the quality was pretty bad. Still I'm aware Kijai mentioned these are experimental models and he's doing this for us completely free, so zero complaint here. 🤠

My weak 4060 ti (16gb) or launch flags (--fast fp16_accumulation fp8_matrix_mult?) could quite possibly be to blame here. haha. 😄

Kijai

Owner Mar 12

Do you need to select 'fp8_e4m3fn_fast' in the 'Load Diffusion Model' node for those input_scaled models to work? In my own tests input_scaled (v1.) were slower than the normal fp8 model. V2. were around the same speed as the normal fp8 but the quality was pretty bad. Still I'm aware Kijai mentioned these are experimental models and he's doing this for us completely free, so zero complaint here. 🤠

My weak 4060 ti (16gb) or launch flags (--fast fp16_accumulation fp8_matrix_mult?) could quite possibly be to blame here. haha. 😄

You should leave it at default, these are mixed precision models that include many bf16 layers too. The fp8 layers are marked to use fp8 matmuls (fp8_fast) already.

The input_scaled models in default mode are ~40% faster for me.

Rumaben

Mar 12

•

edited Mar 12

Thank you Kijai. 🫡 😊 I just tried without any any launch parameter tweaks (only sage attention) and saw no change. The old fp8_scaled is still faster for me by around 30%. Wierd, perhaps it's some wierdness with Arch Linux or because I'm using all nightly Comfyui and repo's haha. Also I'm using Python 3.14.3 and the latest torch 2.12 dev from today. 🤔

Anyway don't worry about it, you have enough on your plate. I'm very happy with everything. The speed is good enough for me, so no complaints from me. 😊

Rumaben

Mar 19

•

edited Mar 19

Do you need to select 'fp8_e4m3fn_fast' in the 'Load Diffusion Model' node for those input_scaled models to work? In my own tests input_scaled (v1.) were slower than the normal fp8 model. V2. were around the same speed as the normal fp8 but the quality was pretty bad. Still I'm aware Kijai mentioned these are experimental models and he's doing this for us completely free, so zero complaint here. 🤠

My weak 4060 ti (16gb) or launch flags (--fast fp16_accumulation fp8_matrix_mult?) could quite possibly be to blame here. haha. 😄

You should leave it at default, these are mixed precision models that include many bf16 layers too. The fp8 layers are marked to use fp8 matmuls (fp8_fast) already.

The input_scaled models in default mode are ~40% faster for me.

I just wanted to say I was wrong in my earlier assumptions. 😳 The input_scaled models are indeed faster as you said Kijai. 😌 Also when I mentioned that input_scaled gave bad quality the reason was most likely that I mistakenly had used the distilled lora with an already distilled model making the quality really bad. The actual difference is not very large. So I gather all that contradicted what you said was my own human error. Live and learn I guess. haha. 😄

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment