# Note on Quantization The quantized version of this model is not included because PyTorch quantization has limited support on Mac M-series chips. To quantize this model on a compatible system: ```python import torch from model.transformer import TransformerLM, ModelConfig # Load the model checkpoint = torch.load("pytorch_model.bin", map_location="cpu") config = # Load your config # Create model instance model = TransformerLM(config) model.load_state_dict(checkpoint) model.eval() # Apply dynamic quantization to linear layers quantized_model = torch.quantization.quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.qint8 ) # Save quantized model torch.save(quantized_model.state_dict(), "pytorch_model_quantized.bin") ```