| # Note on Quantization | |
| The quantized version of this model is not included because PyTorch quantization has limited support on Mac M-series chips. | |
| To quantize this model on a compatible system: | |
| ```python | |
| import torch | |
| from model.transformer import TransformerLM, ModelConfig | |
| # Load the model | |
| checkpoint = torch.load("pytorch_model.bin", map_location="cpu") | |
| config = # Load your config | |
| # Create model instance | |
| model = TransformerLM(config) | |
| model.load_state_dict(checkpoint) | |
| model.eval() | |
| # Apply dynamic quantization to linear layers | |
| quantized_model = torch.quantization.quantize_dynamic( | |
| model, | |
| {torch.nn.Linear}, | |
| dtype=torch.qint8 | |
| ) | |
| # Save quantized model | |
| torch.save(quantized_model.state_dict(), "pytorch_model_quantized.bin") | |
| ``` | |