Spaces:
Running
A newer version of the Streamlit SDK is available:
1.53.1
Character classification
The sample training scripts was made to train a character classification model or a orientation classifier with docTR.
Setup
First, you need to install doctr (with pip, for instance)
pip install -e . --upgrade
pip install -r references/requirements.txt
Usage character classification
You can start your training in PyTorch:
python references/classification/train_character.py mobilenet_v3_large --epochs 5 --device 0
Usage orientation classification
You can start your training in PyTorch:
python references/classification/train_orientation.py resnet18 --type page --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
The type can be either page for document images or crop for word crops.
Data format
You need to provide both train_path and val_path arguments to start training.
Each path must lead to a folder where the images are stored. For example:
images
βββ sample_img_01.png
βββ sample_img_02.png
βββ sample_img_03.png
βββ ...
Slack Logging with tqdm
To enable Slack logging using tqdm, you need to set the following environment variables:
TQDM_SLACK_TOKEN: the Slack Bot TokenTQDM_SLACK_CHANNEL: you can retrieve it usingRight Click on Channel > Copy > Copy link. You should get something likehttps://xxxxxx.slack.com/archives/yyyyyyyy. Keep only theyyyyyyyypart.
You can follow this page on how to create a Slack App.
Advanced options
Feel free to inspect the multiple script option to customize your training to your own needs!
Character classification:
python references/classification/train_character.py --help
Orientation classification:
python references/classification/train_orientation.py --help