enable deterministic training
Browse files- README.md +12 -9
- configs/metadata.json +2 -1
- configs/multi_gpu_evaluate.json +0 -1
- configs/multi_gpu_train.json +0 -1
- configs/train.json +1 -2
- docs/README.md +12 -9
README.md
CHANGED
|
@@ -105,7 +105,7 @@ Example `dataset.json` in output folder:
|
|
| 105 |
|
| 106 |

|
| 107 |
|
| 108 |
-
##
|
| 109 |
This model achieves the following F1 score on the validation data provided as part of the dataset:
|
| 110 |
|
| 111 |
- Train F1 score = 0.941
|
|
@@ -132,26 +132,29 @@ Confusion Metrics for <b>Training</b> for individual classes are (at epoch 50):
|
|
| 132 |
|
| 133 |
|
| 134 |
|
| 135 |
-
|
| 136 |
A graph showing the training Loss and F1-score over 50 epochs.
|
| 137 |
|
| 138 |
 <br>
|
| 139 |
 <br>
|
| 140 |
|
| 141 |
-
|
| 142 |
A graph showing the validation F1-score over 50 epochs.
|
| 143 |
|
| 144 |
 <br>
|
| 145 |
|
|
|
|
|
|
|
| 146 |
|
| 147 |
-
|
| 148 |
-
|
|
|
|
| 149 |
|
| 150 |
```
|
| 151 |
python -m monai.bundle run --config_file configs/train.json
|
| 152 |
```
|
| 153 |
|
| 154 |
-
Override the `train` config to execute multi-GPU training:
|
| 155 |
|
| 156 |
```
|
| 157 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
|
|
@@ -160,19 +163,19 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
|
|
| 160 |
Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
|
| 161 |
Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
|
| 162 |
|
| 163 |
-
Override the `train` config to execute evaluation with the trained model:
|
| 164 |
|
| 165 |
```
|
| 166 |
python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
|
| 167 |
```
|
| 168 |
|
| 169 |
-
Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
|
| 170 |
|
| 171 |
```
|
| 172 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
|
| 173 |
```
|
| 174 |
|
| 175 |
-
Execute inference:
|
| 176 |
|
| 177 |
```
|
| 178 |
python -m monai.bundle run --config_file configs/inference.json
|
|
|
|
| 105 |
|
| 106 |

|
| 107 |
|
| 108 |
+
## Performance
|
| 109 |
This model achieves the following F1 score on the validation data provided as part of the dataset:
|
| 110 |
|
| 111 |
- Train F1 score = 0.941
|
|
|
|
| 132 |
|
| 133 |
|
| 134 |
|
| 135 |
+
#### Training Performance
|
| 136 |
A graph showing the training Loss and F1-score over 50 epochs.
|
| 137 |
|
| 138 |
 <br>
|
| 139 |
 <br>
|
| 140 |
|
| 141 |
+
#### Validation Performance
|
| 142 |
A graph showing the validation F1-score over 50 epochs.
|
| 143 |
|
| 144 |
 <br>
|
| 145 |
|
| 146 |
+
## MONAI Bundle Commands
|
| 147 |
+
In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
|
| 148 |
|
| 149 |
+
For more details usage instructions, visit the [MONAI Bundle Configuration Page](https://docs.monai.io/en/latest/config_syntax.html).
|
| 150 |
+
|
| 151 |
+
#### Execute training:
|
| 152 |
|
| 153 |
```
|
| 154 |
python -m monai.bundle run --config_file configs/train.json
|
| 155 |
```
|
| 156 |
|
| 157 |
+
#### Override the `train` config to execute multi-GPU training:
|
| 158 |
|
| 159 |
```
|
| 160 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
|
|
|
|
| 163 |
Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
|
| 164 |
Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
|
| 165 |
|
| 166 |
+
#### Override the `train` config to execute evaluation with the trained model:
|
| 167 |
|
| 168 |
```
|
| 169 |
python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
|
| 170 |
```
|
| 171 |
|
| 172 |
+
#### Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
|
| 173 |
|
| 174 |
```
|
| 175 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
|
| 176 |
```
|
| 177 |
|
| 178 |
+
#### Execute inference:
|
| 179 |
|
| 180 |
```
|
| 181 |
python -m monai.bundle run --config_file configs/inference.json
|
configs/metadata.json
CHANGED
|
@@ -1,7 +1,8 @@
|
|
| 1 |
{
|
| 2 |
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
|
| 3 |
-
"version": "0.0.
|
| 4 |
"changelog": {
|
|
|
|
| 5 |
"0.0.7": "update benchmark on A100",
|
| 6 |
"0.0.6": "adapt to BundleWorkflow interface",
|
| 7 |
"0.0.5": "add name tag",
|
|
|
|
| 1 |
{
|
| 2 |
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
|
| 3 |
+
"version": "0.0.8",
|
| 4 |
"changelog": {
|
| 5 |
+
"0.0.8": "enable deterministic training",
|
| 6 |
"0.0.7": "update benchmark on A100",
|
| 7 |
"0.0.6": "adapt to BundleWorkflow interface",
|
| 8 |
"0.0.5": "add name tag",
|
configs/multi_gpu_evaluate.json
CHANGED
|
@@ -21,7 +21,6 @@
|
|
| 21 |
"$import torch.distributed as dist",
|
| 22 |
"$dist.is_initialized() or dist.init_process_group(backend='nccl')",
|
| 23 |
"$torch.cuda.set_device(@device)",
|
| 24 |
-
"$setattr(torch.backends.cudnn, 'benchmark', True)",
|
| 25 |
"$import logging",
|
| 26 |
"$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
|
| 27 |
"$import scripts",
|
|
|
|
| 21 |
"$import torch.distributed as dist",
|
| 22 |
"$dist.is_initialized() or dist.init_process_group(backend='nccl')",
|
| 23 |
"$torch.cuda.set_device(@device)",
|
|
|
|
| 24 |
"$import logging",
|
| 25 |
"$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
|
| 26 |
"$import scripts",
|
configs/multi_gpu_train.json
CHANGED
|
@@ -31,7 +31,6 @@
|
|
| 31 |
"$dist.is_initialized() or dist.init_process_group(backend='nccl')",
|
| 32 |
"$torch.cuda.set_device(@device)",
|
| 33 |
"$monai.utils.set_determinism(seed=123)",
|
| 34 |
-
"$setattr(torch.backends.cudnn, 'benchmark', True)",
|
| 35 |
"$import logging",
|
| 36 |
"$@train#trainer.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
|
| 37 |
"$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)"
|
|
|
|
| 31 |
"$dist.is_initialized() or dist.init_process_group(backend='nccl')",
|
| 32 |
"$torch.cuda.set_device(@device)",
|
| 33 |
"$monai.utils.set_determinism(seed=123)",
|
|
|
|
| 34 |
"$import logging",
|
| 35 |
"$@train#trainer.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
|
| 36 |
"$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)"
|
configs/train.json
CHANGED
|
@@ -343,8 +343,7 @@
|
|
| 343 |
"initialize": [
|
| 344 |
"$import sys",
|
| 345 |
"$sys.path.append(@bundle_root)",
|
| 346 |
-
"$monai.utils.set_determinism(seed=123)"
|
| 347 |
-
"$setattr(torch.backends.cudnn, 'benchmark', True)"
|
| 348 |
],
|
| 349 |
"run": [
|
| 350 |
"$@train#trainer.run()"
|
|
|
|
| 343 |
"initialize": [
|
| 344 |
"$import sys",
|
| 345 |
"$sys.path.append(@bundle_root)",
|
| 346 |
+
"$monai.utils.set_determinism(seed=123)"
|
|
|
|
| 347 |
],
|
| 348 |
"run": [
|
| 349 |
"$@train#trainer.run()"
|
docs/README.md
CHANGED
|
@@ -98,7 +98,7 @@ Example `dataset.json` in output folder:
|
|
| 98 |
|
| 99 |

|
| 100 |
|
| 101 |
-
##
|
| 102 |
This model achieves the following F1 score on the validation data provided as part of the dataset:
|
| 103 |
|
| 104 |
- Train F1 score = 0.941
|
|
@@ -125,26 +125,29 @@ Confusion Metrics for <b>Training</b> for individual classes are (at epoch 50):
|
|
| 125 |
|
| 126 |
|
| 127 |
|
| 128 |
-
|
| 129 |
A graph showing the training Loss and F1-score over 50 epochs.
|
| 130 |
|
| 131 |
 <br>
|
| 132 |
 <br>
|
| 133 |
|
| 134 |
-
|
| 135 |
A graph showing the validation F1-score over 50 epochs.
|
| 136 |
|
| 137 |
 <br>
|
| 138 |
|
|
|
|
|
|
|
| 139 |
|
| 140 |
-
|
| 141 |
-
|
|
|
|
| 142 |
|
| 143 |
```
|
| 144 |
python -m monai.bundle run --config_file configs/train.json
|
| 145 |
```
|
| 146 |
|
| 147 |
-
Override the `train` config to execute multi-GPU training:
|
| 148 |
|
| 149 |
```
|
| 150 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
|
|
@@ -153,19 +156,19 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
|
|
| 153 |
Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
|
| 154 |
Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
|
| 155 |
|
| 156 |
-
Override the `train` config to execute evaluation with the trained model:
|
| 157 |
|
| 158 |
```
|
| 159 |
python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
|
| 160 |
```
|
| 161 |
|
| 162 |
-
Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
|
| 163 |
|
| 164 |
```
|
| 165 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
|
| 166 |
```
|
| 167 |
|
| 168 |
-
Execute inference:
|
| 169 |
|
| 170 |
```
|
| 171 |
python -m monai.bundle run --config_file configs/inference.json
|
|
|
|
| 98 |
|
| 99 |

|
| 100 |
|
| 101 |
+
## Performance
|
| 102 |
This model achieves the following F1 score on the validation data provided as part of the dataset:
|
| 103 |
|
| 104 |
- Train F1 score = 0.941
|
|
|
|
| 125 |
|
| 126 |
|
| 127 |
|
| 128 |
+
#### Training Performance
|
| 129 |
A graph showing the training Loss and F1-score over 50 epochs.
|
| 130 |
|
| 131 |
 <br>
|
| 132 |
 <br>
|
| 133 |
|
| 134 |
+
#### Validation Performance
|
| 135 |
A graph showing the validation F1-score over 50 epochs.
|
| 136 |
|
| 137 |
 <br>
|
| 138 |
|
| 139 |
+
## MONAI Bundle Commands
|
| 140 |
+
In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
|
| 141 |
|
| 142 |
+
For more details usage instructions, visit the [MONAI Bundle Configuration Page](https://docs.monai.io/en/latest/config_syntax.html).
|
| 143 |
+
|
| 144 |
+
#### Execute training:
|
| 145 |
|
| 146 |
```
|
| 147 |
python -m monai.bundle run --config_file configs/train.json
|
| 148 |
```
|
| 149 |
|
| 150 |
+
#### Override the `train` config to execute multi-GPU training:
|
| 151 |
|
| 152 |
```
|
| 153 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
|
|
|
|
| 156 |
Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
|
| 157 |
Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
|
| 158 |
|
| 159 |
+
#### Override the `train` config to execute evaluation with the trained model:
|
| 160 |
|
| 161 |
```
|
| 162 |
python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
|
| 163 |
```
|
| 164 |
|
| 165 |
+
#### Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
|
| 166 |
|
| 167 |
```
|
| 168 |
torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
|
| 169 |
```
|
| 170 |
|
| 171 |
+
#### Execute inference:
|
| 172 |
|
| 173 |
```
|
| 174 |
python -m monai.bundle run --config_file configs/inference.json
|