# Table of Contents 1. [Introduction](#1) 2. [Minimal Yaml Configs](#2) 3. [Tensor Override](#3) 4. [Various Operating Modes](#4) - [4.1 Operating_mode: inspection](#4-1) - [4.2 Operating mode: full_auto](#4-2)
1. Introduction This library contains a collection of scripts that are useful for - improving mAp of quantized models - designing mixed precision models - investigation of quantization issues It is based on ONNX quantizer only. To have good support of 4-bits and 16-bits, it is recommended to set 'quantization.target_opset' to 21 and thus ir_version 10. The advanced quantization services 'inspection', 'full_auto' are only enabled when 'operation_mode' is 'quantization'.
2. Minimal Yaml Configs The legacy 'quantization' section is mandatory. The description provided in this readme assumes that we use ONNX quantizer and not LiteRT. Therefore, 'quantization.quantizer' must be 'Onnx_quantizer'. If 'quantization.operating_mode' is not defined or is set to 'default', we simply run legacy quantization with the yaml specified parameters. If some parameters are left undefined, default values are used: WeightSymmetric: True ActivationSymmetric: False CalibMovingAverage: False QuantizeBias = True SmoothQuant: False SmoothQuantAlpha: 0.5 SmoothQuantFolding: True TensorQuantOverrides: None op_types_to_quantize = None nodes_to_quantize = None nodes_to_exclude = None calibrate_method = CalibrationMethod.MinMax weight_type = QuantType.QInt8 activ_type = QuantType.QInt8 Calibration moving average is still controlled in configs.quantization.onnx_extra_options parameters. 'quantization.iterative_quant_parameters' are only taken into account for 'inspection', 'full_auto'. 'iterative_quant_parameters' first defines a parameter called 'inspection_split' used in 'inspection' for defining a subset of the quantization set on which are performed SNR measurements. To finish, 'iterative_quant_parameters' also defines 'accuracy_tolerance'. This parameter is only used for mode 'full_auto'. In object detection, it corresponds to the mAp margin (in percent) we tolerate relatively to full 8 bits performance. The purpose of this selection will be more detailed in operating modes description in this readme.
3. Tensor Override It is managed by 'quantization.onnx_extra_options.weights_tensor_override' and 'quantization.onnx_extra_options.activations_tensor_override' parameters. Examples for setting the override parameters: weights_tensor_override: [['efficientnetv2-b0_1/block6a_se_reduce_1/convolution/ReadVariableOp:0', {'quant_type': Int16, 'scale': 0.0025, 'zero_point': -12}], ['efficientnetv2-b0_1/stem_conv_1/convolution/merged_input:0', {'quant_type': Int16, "axis": 0}] ] activations_tensor_override: [['efficientnetv2-b0_1/block4a_se_reduce_1/mul_1:0', {'quant_type': Int16}], ['efficientnetv2-b0_1/block4a_se_expand_1/BiasAdd__79:0', {'quant_type': Int16, 'scale': 0.002, 'zero_point': 0}] ] Common remarks: 'zero_point' must be: - one integer scalar or a list of one integer value for per-tensor - a list of several integer values for per-channel 'scale' must be: - one float scalar or a list of one floating-point value for per-tensor - a list of several floating-point values for per-channel Remarks for the weights: - When one specifies only 'quant_type' without 'axis': whatever 'quantization.granularity' value, the tensor will be quantized on a per-tensor basis. One 'scale' and one 'zero_point' can optionally be specified. Since weights are quantized per-tensor, bias will automatically be quantized per-tensor too. - if 'axis' is specified, then it must be an integer: 0 if the tensor corresponds to the weights of any type of Conv, 1 for a ConvTranspose, 1 too for a Gemm or a MatMul by default. - when 'axis' is specified (meaning that we intend to quantize the given tensor per-channel), if 'scale' and 'zero_point' are specified too, they must be a list of values whose length corresponds to the number of output channels for the considered tensor. Remarks for activations: - They are not quantized per-channel. So if 'scale' and 'zero_point' are defined, one value for each is enough. Potential parameters conflict: - With current onnx and onnxruntime version we must not set 'onnx_quant_parameters.nodes_to_quantize' and 'onnx_extra_options.weights_tensor_override'. It seems they are mutually exclusive.
4. Various Operating Modes