ChrisGoringe
/

MixedQuantFlux

Model card Files Files and versions

ChrisGoringe commited on Sep 19, 2024

Commit

186c280

·

verified ·

1 Parent(s): e1b9a1b

Update README.md

Files changed (1) hide show

README.md +9 -16

README.md CHANGED Viewed

@@ -24,25 +24,18 @@ where N_N is the average number of bits per parameter.
 ## Good choices to start with
 ```
--  9_2 is a good choice for 16 GB cards
--  6_9 just fits on a 12 GB card
--  5_9 is comfortable on 12 GB cards
 ```
 ## Speed?
-On an A40 (plenty of VRAM), everything except the model identical, the time taken to generate an image (30 steps, deis sampler) was:
-- 5_1 => 40.1s
-- 5_9 => 55.4s
-- 6_9 => 52.1s
-- 7_4 => 49.7s
-- 7_6 => 43.6s
-- 8_4 => 46.8s
-- 9_2 => 42.8s
-- 9_6 => 48.2s
-for comparison, the unquantised models take about 27s.
 ## How is this optimised?
@@ -63,5 +56,5 @@ The process for optimisation is as follows:
 - Tests on using bitsandbytes quantizations showed they did not perform as well as the equivalent sized GGUF quants
 - Different quantizations of different parts of a layer gave significantly worse results
-- Leaving bias in 16 bit made no relevant difference
-- Costs were evaluated for the original Flux.1-dev model. They are assumed to be essentially the same for finetunes

 ## Good choices to start with
 ```
+-  3_8 might work on a 8 GB card
+-  6_9 should be good for a 12 GB card
+-  8_2 is a good choice for 16 GB cards if you want to add LoRAs etc
+-  9_2 fits on a 16 GB card
 ```
 ## Speed?
+On an A40 (plenty of VRAM), everything except the model identical,
+the time taken to generate an image (30 steps, deis sampler) was about 65% longer than for the full model.
+Quantised models will generally be slower because the weights have to be converted back into a native torch form when they are needed.
 ## How is this optimised?
 - Tests on using bitsandbytes quantizations showed they did not perform as well as the equivalent sized GGUF quants
 - Different quantizations of different parts of a layer gave significantly worse results
+- Leaving bias in 16 bit made no relevant difference (the 'patched' models generally do)
+- Costs were evaluated for the original Flux.1-dev model. They are probably essentially the same for finetunes