monai
medical
katielink commited on
Commit
e430b14
·
1 Parent(s): e66fcc1

update the README file with the ONNX-TensorRT conversion

Browse files
Files changed (3) hide show
  1. README.md +7 -6
  2. configs/metadata.json +2 -1
  3. docs/README.md +7 -6
README.md CHANGED
@@ -52,13 +52,12 @@ Dice score is used for evaluating the performance of the model. This model achie
52
  ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
53
 
54
  #### TensorRT speedup
55
- The `spleen_ct_segmentation` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
56
 
57
  | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
58
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
59
- | model computation | 6.48 | 4.48 | 6.40 | 6.30 | 1.45 | 1.01 | 1.03 | 0.71 |
60
- | model computation(onnx) | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
61
- | end2end | 3900.73 | 3823.89 | 3887.37 | 3883.01 | 1.02 | 1.00 | 1.00 | 0.98 |
62
 
63
  Where:
64
  - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
@@ -68,13 +67,15 @@ Where:
68
  - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
69
  - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
70
 
 
 
71
  This result is benchmarked under:
72
  - TensorRT: 8.5.3+cuda11.8
73
  - Torch-TensorRT Version: 1.4.0
74
  - CPU Architecture: x86-64
75
  - OS: ubuntu 20.04
76
  - Python version:3.8.10
77
- - CUDA version: 12.0
78
  - GPU models and configuration: A100 80G
79
 
80
  ## MONAI Bundle Commands
@@ -117,7 +118,7 @@ python -m monai.bundle run --config_file configs/inference.json
117
  #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
118
 
119
  ```
120
- python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]"
121
  ```
122
 
123
  #### Execute inference with the TensorRT model:
 
52
  ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
53
 
54
  #### TensorRT speedup
55
+ The `spleen_ct_segmentation` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU.
56
 
57
  | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
58
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
59
+ | model computation | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
60
+ | end2end | 1268.03 | 1152.40 | 1137.40 | 1114.25 | 1.10 | 1.11 | 1.14 | 1.03 |
 
61
 
62
  Where:
63
  - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
 
67
  - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
68
  - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
69
 
70
+ Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
71
+
72
  This result is benchmarked under:
73
  - TensorRT: 8.5.3+cuda11.8
74
  - Torch-TensorRT Version: 1.4.0
75
  - CPU Architecture: x86-64
76
  - OS: ubuntu 20.04
77
  - Python version:3.8.10
78
+ - CUDA version: 12.1
79
  - GPU models and configuration: A100 80G
80
 
81
  ## MONAI Bundle Commands
 
118
  #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
119
 
120
  ```
121
+ python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]" --use_onnx "True" --use_trace "True"
122
  ```
123
 
124
  #### Execute inference with the TensorRT model:
configs/metadata.json CHANGED
@@ -1,7 +1,8 @@
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
- "version": "0.4.9",
4
  "changelog": {
 
5
  "0.4.9": "update TensorRT descriptions",
6
  "0.4.8": "update deterministic training results",
7
  "0.4.7": "update the TensorRT part in the README file",
 
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
+ "version": "0.5.0",
4
  "changelog": {
5
+ "0.5.0": "update the README file with the ONNX-TensorRT conversion",
6
  "0.4.9": "update TensorRT descriptions",
7
  "0.4.8": "update deterministic training results",
8
  "0.4.7": "update the TensorRT part in the README file",
docs/README.md CHANGED
@@ -45,13 +45,12 @@ Dice score is used for evaluating the performance of the model. This model achie
45
  ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
46
 
47
  #### TensorRT speedup
48
- The `spleen_ct_segmentation` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
49
 
50
  | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
51
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
52
- | model computation | 6.48 | 4.48 | 6.40 | 6.30 | 1.45 | 1.01 | 1.03 | 0.71 |
53
- | model computation(onnx) | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
54
- | end2end | 3900.73 | 3823.89 | 3887.37 | 3883.01 | 1.02 | 1.00 | 1.00 | 0.98 |
55
 
56
  Where:
57
  - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
@@ -61,13 +60,15 @@ Where:
61
  - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
62
  - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
63
 
 
 
64
  This result is benchmarked under:
65
  - TensorRT: 8.5.3+cuda11.8
66
  - Torch-TensorRT Version: 1.4.0
67
  - CPU Architecture: x86-64
68
  - OS: ubuntu 20.04
69
  - Python version:3.8.10
70
- - CUDA version: 12.0
71
  - GPU models and configuration: A100 80G
72
 
73
  ## MONAI Bundle Commands
@@ -110,7 +111,7 @@ python -m monai.bundle run --config_file configs/inference.json
110
  #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
111
 
112
  ```
113
- python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]"
114
  ```
115
 
116
  #### Execute inference with the TensorRT model:
 
45
  ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
46
 
47
  #### TensorRT speedup
48
+ The `spleen_ct_segmentation` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU.
49
 
50
  | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
51
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
52
+ | model computation | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
53
+ | end2end | 1268.03 | 1152.40 | 1137.40 | 1114.25 | 1.10 | 1.11 | 1.14 | 1.03 |
 
54
 
55
  Where:
56
  - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
 
60
  - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
61
  - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
62
 
63
+ Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
64
+
65
  This result is benchmarked under:
66
  - TensorRT: 8.5.3+cuda11.8
67
  - Torch-TensorRT Version: 1.4.0
68
  - CPU Architecture: x86-64
69
  - OS: ubuntu 20.04
70
  - Python version:3.8.10
71
+ - CUDA version: 12.1
72
  - GPU models and configuration: A100 80G
73
 
74
  ## MONAI Bundle Commands
 
111
  #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
112
 
113
  ```
114
+ python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]" --use_onnx "True" --use_trace "True"
115
  ```
116
 
117
  #### Execute inference with the TensorRT model: