Benchmarks
I ran some benchmarks to test your recent versions, using an RTX Pro 6000, same prompt and same 4 seeds on each.
Three tests for each checkpoint, 4 images each, euler+normal, 1024 side, no upscale or refiner:
First test 30 steps, 3.5 cfg, no lightning lora
Second test 14 steps, 1.3 cfg, using Chroma-Flash-Heun_-_Rank_64 lora
Third test 14 steps, 1 cfg, using Chroma-Flash-Heun_-_Rank_256 lora
12.5 seconds && 5.5 seconds && 3 seconds:
- Chroma1-HD-fp8_scaled_defaultloader_hybrid_large_rev2.safetensors
- Chroma1-HD-fp8matmulmixed_large_rev2.safetensors
- Chroma1-HD-fp8mixed.safetensors
18.5 seconds && 8.5 seconds && 5 seconds:
- Chroma1-HD-fp8mixed_fullmm_large_rev2.safetensors
The images in the first group are identical to one another, the second group's images don't match any others.
The images in the first group are identical to one another, the second group's images don't match any others.
And you did not include the images or run the tests with the bf16 mode?