bengsoon
/

DriLLM-Summarizer

Model card Files Files and versions

bengsoon commited on Dec 4, 2024

Commit

5494497

·

verified ·

1 Parent(s): bb0719d

Add quantization option

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -38,6 +38,8 @@ TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, pa
 ```
 ### Inferencing using Transformers Pipeline
 ``` python
 import transformers
 import torch
@@ -82,4 +84,16 @@ output = pipeline(input)
 print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
 # > Response:  Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
 ```

 ```
 ### Inferencing using Transformers Pipeline
+The code below was tested on a Google colab (with the free T4 GPU).
 ``` python
 import transformers
 import torch
 print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
 # > Response:  Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
+```
+### Quantized model
+If you are facing GPU constraints, you can try to load it with 8-bit quantization
+``` python
+  pipeline = transformers.pipeline(
+      "text-generation",
+      model=model_id,
+      model_kwargs={"torch_dtype": torch.bfloat16, "load_in_8bit": True},  # Use 8-bit quantization
+      device_map="auto"
+  )
 ```