Spaces:

limitless235
/

llm-pushback

Sleeping

Sahil Seemant commited on Mar 10

Commit

309cfde

1 Parent(s): 402e3e2

Fix TypeError by using explicit BitsAndBytesConfig for quantization

Files changed (1) hide show

chat_gui.py CHANGED Viewed

@@ -19,6 +19,7 @@ except (ImportError, ModuleNotFoundError):
             AutoModelForImageTextToText,
             AutoTokenizer,
             AutoProcessor,
             TextIteratorStreamer
         )
         from peft import PeftModel
@@ -310,12 +311,20 @@ if st.session_state.messages and st.session_state.messages[-1]["role"] == "user"
                         trust_remote_code=True,
                         use_fast=False
                     )
-                    # Use 4-bit quantization if on low-memory cloud
                     model = model_class.from_pretrained(
                         conf["path"],
-                        torch_dtype=torch.float16,
                         device_map="auto",
-                        load_in_4bit=True,
                         token=hf_token,
                         trust_remote_code=True
                     )

             AutoModelForImageTextToText,
             AutoTokenizer,
             AutoProcessor,
+            BitsAndBytesConfig,
             TextIteratorStreamer
         )
         from peft import PeftModel
                         trust_remote_code=True,
                         use_fast=False
                     )
+                    # Use 4-bit quantization config (more stable than passing load_in_4bit directly)
+                    quantization_config = BitsAndBytesConfig(
+                        load_in_4bit=True,
+                        bnb_4bit_compute_dtype=torch.float16,
+                        bnb_4bit_quant_type="nf4",
+                        bnb_4bit_use_double_quant=True,
+                    )
+                    # Load model with explicit quantization config
                     model = model_class.from_pretrained(
                         conf["path"],
                         device_map="auto",
+                        quantization_config=quantization_config,
                         token=hf_token,
                         trust_remote_code=True
                     )