task: text-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}
Number of evaluation samples: All dataset

Fixed parameters:

  • dataset: [{'path': 'glue', 'eval_split': 'validation', 'data_keys': {'primary': 'sentence'}, 'ref_keys': ['label'], 'name': 'sst2', 'calibration_split': None}]
  • name_or_path: distilbert-base-uncased-finetuned-sst-2-english
  • from_transformers: True
  • quantization_approach: dynamic
  • node_exclusion: []

Benchmarked parameters:

  • framework: onnxruntime, pytorch
  • operators_to_quantize: ['Add', 'MatMul'], ['Add']
  • per_channel: False, True
  • framework_args: {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}, {}
  • apply_quantization: True, False

Evaluation

Non-time metrics

framework operators_to_quantize per_channel framework_args apply_quantization accuracy
onnxruntime None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False | 0.911
onnxruntime ['Add', 'MatMul'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.898
onnxruntime ['Add', 'MatMul'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.490
onnxruntime ['Add'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.911
onnxruntime ['Add'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 0.911
pytorch None None {} None | 0.911

Time metrics

Time benchmarks were run for 15 seconds per config.

Below, time metrics for batch size = 1, input length = 224.

framework operators_to_quantize per_channel framework_args apply_quantization latency_mean (ms) throughput (/s)
onnxruntime None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False | 83.23 | 12.07
onnxruntime ['Add', 'MatMul'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 64.31 | 15.60
onnxruntime ['Add', 'MatMul'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 64.78 | 15.47
onnxruntime ['Add'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 82.63 | 12.13
onnxruntime ['Add'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True | 83.82 | 11.93
pytorch None None {} None | 84.34 | 11.87
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train fxmarty/20220911-h15m48s16_