Update README.md
Browse files
README.md
CHANGED
|
@@ -215,4 +215,17 @@ for i, detail in enumerate(sentence_details, 1):
|
|
| 215 |
|
| 216 |
print("="*80)
|
| 217 |
|
| 218 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
|
| 216 |
print("="*80)
|
| 217 |
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
If you want to quantize this model to save a lot of memory, you can use torchao.
|
| 221 |
+
This is the config you would use if you wanted to run it on a laptop or small device
|
| 222 |
+
|
| 223 |
+
```python
|
| 224 |
+
from torchao.quantization import quantize_, Int8WeightOnlyConfig
|
| 225 |
+
|
| 226 |
+
model.eval().to("cpu")
|
| 227 |
+
|
| 228 |
+
# In-place: converts Linear layers to int8 weights
|
| 229 |
+
quantize_(model, Int8WeightOnlyConfig())
|
| 230 |
+
```
|
| 231 |
+
|