bertbobson commited on
Commit
086eec4
·
verified ·
1 Parent(s): 313ac9d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ This is an unavoidable double quantization due to the release state of Ideogram4.
2
+
3
+ The FP8 weights were cast to FP32 with the FP8 scales, then downcast to BF16 before being converted to INT8.
4
+
5
+ For use with https://github.com/BobJohnson24/ComfyUI-INT8-Fast