Shoonya / quantization_note.md

Upload Shoonya Model v0.2 with DeepSeek CPU optimizations

8493c0e verified 10 months ago

749 Bytes

	# Note on Quantization

	The quantized version of this model is not included because PyTorch quantization has limited support on Mac M-series chips.

	To quantize this model on a compatible system:
	```python
	import torch
	from model.transformer import TransformerLM, ModelConfig

	# Load the model
	checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
	config = # Load your config

	# Create model instance
	model = TransformerLM(config)
	model.load_state_dict(checkpoint)
	model.eval()

	# Apply dynamic quantization to linear layers
	quantized_model = torch.quantization.quantize_dynamic(
	model,
	{torch.nn.Linear},
	dtype=torch.qint8
	)

	# Save quantized model
	torch.save(quantized_model.state_dict(), "pytorch_model_quantized.bin")
	```