File size: 749 Bytes
8493c0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Note on Quantization

The quantized version of this model is not included because PyTorch quantization has limited support on Mac M-series chips.

To quantize this model on a compatible system:
```python
import torch
from model.transformer import TransformerLM, ModelConfig

# Load the model
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
config = # Load your config

# Create model instance
model = TransformerLM(config)
model.load_state_dict(checkpoint)
model.eval()

# Apply dynamic quantization to linear layers
quantized_model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},
    dtype=torch.qint8
)

# Save quantized model
torch.save(quantized_model.state_dict(), "pytorch_model_quantized.bin")
```