File size: 749 Bytes
8493c0e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# Note on Quantization
The quantized version of this model is not included because PyTorch quantization has limited support on Mac M-series chips.
To quantize this model on a compatible system:
```python
import torch
from model.transformer import TransformerLM, ModelConfig
# Load the model
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
config = # Load your config
# Create model instance
model = TransformerLM(config)
model.load_state_dict(checkpoint)
model.eval()
# Apply dynamic quantization to linear layers
quantized_model = torch.quantization.quantize_dynamic(
model,
{torch.nn.Linear},
dtype=torch.qint8
)
# Save quantized model
torch.save(quantized_model.state_dict(), "pytorch_model_quantized.bin")
```
|