Add quantization option
Browse files
README.md
CHANGED
|
@@ -38,6 +38,8 @@ TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, pa
|
|
| 38 |
```
|
| 39 |
|
| 40 |
### Inferencing using Transformers Pipeline
|
|
|
|
|
|
|
| 41 |
``` python
|
| 42 |
import transformers
|
| 43 |
import torch
|
|
@@ -82,4 +84,16 @@ output = pipeline(input)
|
|
| 82 |
|
| 83 |
print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
|
| 84 |
# > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
```
|
|
|
|
| 38 |
```
|
| 39 |
|
| 40 |
### Inferencing using Transformers Pipeline
|
| 41 |
+
The code below was tested on a Google colab (with the free T4 GPU).
|
| 42 |
+
|
| 43 |
``` python
|
| 44 |
import transformers
|
| 45 |
import torch
|
|
|
|
| 84 |
|
| 85 |
print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
|
| 86 |
# > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### Quantized model
|
| 90 |
+
If you are facing GPU constraints, you can try to load it with 8-bit quantization
|
| 91 |
+
|
| 92 |
+
``` python
|
| 93 |
+
pipeline = transformers.pipeline(
|
| 94 |
+
"text-generation",
|
| 95 |
+
model=model_id,
|
| 96 |
+
model_kwargs={"torch_dtype": torch.bfloat16, "load_in_8bit": True}, # Use 8-bit quantization
|
| 97 |
+
device_map="auto"
|
| 98 |
+
)
|
| 99 |
```
|