Spaces:
Running
Running
File size: 4,519 Bytes
f3270e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# Text detection
The sample training script was made to train text detection model with docTR.
## Setup
First, you need to install `doctr` (with pip, for instance)
```shell
pip install -e . --upgrade
pip install -r references/requirements.txt
```
## Usage
You can start your training in PyTorch:
```shell
python references/detection/train.py db_resnet50 --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```
### Multi-GPU support
We now use the built-in [`torchrun`](https://pytorch.org/docs/stable/elastic/run.html) launcher to spawn your DDP workers. `torchrun` will set all the necessary environment variables (`LOCAL_RANK`, `RANK`, etc.) for you. Arguments are the same than the ones from single GPU, except:
- `--backend`: you can specify another `backend` for `DistributedDataParallel` if the default one is not available on
your operating system. Fastest one is `nccl` according to [PyTorch Documentation](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html).
#### Key `torchrun` parameters
- `--nproc_per_node=<N>`
Spawn `<N>` processes on the local machine (typically equal to the number of GPUs you want to use).
- `--nnodes=<M>`
(Optional) Total number of nodes in your job. Default is 1.
- `--rdzv_backend`, `--rdzv_endpoint`, `--rdzv_id`
(Optional) Rendezvous settings for multi-node jobs. See the [torchrun docs](https://pytorch.org/docs/stable/elastic/run.html) for details.
#### GPU selection
By default all visible GPUs will be used. To limit which GPUs participate, set the `CUDA_VISIBLE_DEVICES` environment variable **before** running `torchrun`. For example, to use only CUDA devices 0 and 2:
```shell
CUDA_VISIBLE_DEVICES=0,2 \
torchrun --nproc_per_node=2 references/detection/train.py \
db_resnet50 \
--train_path path/to/train \
--val_path path/to/val \
--epochs 5 \
--backend nccl
```
## Data format
You need to provide both `train_path` and `val_path` arguments to start training.
Each path must lead to folder with 1 subfolder and 1 file:
```shell
βββ images
β βββ sample_img_01.png
β βββ sample_img_02.png
β βββ sample_img_03.png
β βββ ...
βββ labels.json
```
Each JSON file must be a dictionary, where the keys are the image file names and the value is a dictionary with 3 entries: `img_dimensions` (spatial shape of the image), `img_hash` (SHA256 of the image file), `polygons` (the set of 2D points forming the localization polygon).
The order of the points does not matter inside a polygon. Points are (x, y) absolutes coordinates.
labels.json
```shell
{
"sample_img_01.png" = {
'img_dimensions': (900, 600),
'img_hash': "theimagedumpmyhash",
'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
},
"sample_img_02.png" = {
'img_dimensions': (900, 600),
'img_hash': "thisisahash",
'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
}
...
}
```
If you want to train a model with multiple classes, you can use the following format where polygons is a dictionary where each key represents one class and has all the polygons representing that class.
labels.json
```shell
{
"sample_img_01.png": {
'img_dimensions': (900, 600),
'img_hash': "theimagedumpmyhash",
'polygons': {
"class_name_1": [[[x10, y10], [x20, y20], [x30, y30], [x40, y40]], ...],
"class_name_2": [[[x11, y11], [x21, y21], [x31, y31], [x41, y41]], ...]
}
},
"sample_img_02.png": {
'img_dimensions': (900, 600),
'img_hash': "thisisahash",
'polygons': {
"class_name_1": [[[x12, y12], [x22, y22], [x32, y32], [x42, y42]], ...],
"class_name_2": [[[x13, y13], [x23, y23], [x33, y33], [x43, y43]], ...]
}
},
...
}
```
## Slack Logging with tqdm
To enable Slack logging using `tqdm`, you need to set the following environment variables:
- `TQDM_SLACK_TOKEN`: the Slack Bot Token
- `TQDM_SLACK_CHANNEL`: you can retrieve it using `Right Click on Channel > Copy > Copy link`. You should get something like `https://xxxxxx.slack.com/archives/yyyyyyyy`. Keep only the `yyyyyyyy` part.
You can follow this page on [how to create a Slack App](https://api.slack.com/quickstart).
## Advanced options
Feel free to inspect the multiple script option to customize your training to your own needs!
```python
python references/detection/train.py --help
```
|