Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,138 +1,13 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
```
|
| 15 |
-
$ git@github.com:FangShancheng/ABINet.git
|
| 16 |
-
$ docker run --gpus all --rm -ti --ipc=host -v $(pwd)/ABINet:/app fangshancheng/fastai:torch1.1 /bin/bash
|
| 17 |
-
```
|
| 18 |
-
- (Untested) Or using the dependencies
|
| 19 |
-
```
|
| 20 |
-
pip install -r requirements.txt
|
| 21 |
-
```
|
| 22 |
-
|
| 23 |
-
## Datasets
|
| 24 |
-
|
| 25 |
-
- Training datasets
|
| 26 |
-
|
| 27 |
-
1. [MJSynth](http://www.robots.ox.ac.uk/~vgg/data/text/) (MJ):
|
| 28 |
-
- Use `tools/create_lmdb_dataset.py` to convert images into LMDB dataset
|
| 29 |
-
- [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
|
| 30 |
-
2. [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST):
|
| 31 |
-
- Use `tools/crop_by_word_bb.py` to crop images from original [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) dataset, and convert images into LMDB dataset by `tools/create_lmdb_dataset.py`
|
| 32 |
-
- [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
|
| 33 |
-
3. [WikiText103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip), which is only used for pre-trainig language models:
|
| 34 |
-
- Use `notebooks/prepare_wikitext103.ipynb` to convert text into CSV format.
|
| 35 |
-
- [CSV dataset BaiduNetdisk(passwd:dk01)](https://pan.baidu.com/s/1yabtnPYDKqhBb_Ie9PGFXA)
|
| 36 |
-
|
| 37 |
-
- Evaluation datasets, LMDB datasets can be downloaded from [BaiduNetdisk(passwd:1dbv)](https://pan.baidu.com/s/1RUg3Akwp7n8kZYJ55rU5LQ), [GoogleDrive](https://drive.google.com/file/d/1dTI0ipu14Q1uuK4s4z32DqbqF3dJPdkk/view?usp=sharing).
|
| 38 |
-
1. ICDAR 2013 (IC13)
|
| 39 |
-
2. ICDAR 2015 (IC15)
|
| 40 |
-
3. IIIT5K Words (IIIT)
|
| 41 |
-
4. Street View Text (SVT)
|
| 42 |
-
5. Street View Text-Perspective (SVTP)
|
| 43 |
-
6. CUTE80 (CUTE)
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
- The structure of `data` directory is
|
| 47 |
-
```
|
| 48 |
-
data
|
| 49 |
-
βββ charset_36.txt
|
| 50 |
-
βββ evaluation
|
| 51 |
-
βΒ Β βββ CUTE80
|
| 52 |
-
βΒ Β βββ IC13_857
|
| 53 |
-
βΒ Β βββ IC15_1811
|
| 54 |
-
βΒ Β βββ IIIT5k_3000
|
| 55 |
-
βΒ Β βββ SVT
|
| 56 |
-
βΒ Β βββ SVTP
|
| 57 |
-
βββ training
|
| 58 |
-
βΒ Β βββ MJ
|
| 59 |
-
βΒ Β βΒ Β βββ MJ_test
|
| 60 |
-
βΒ Β βΒ Β βββ MJ_train
|
| 61 |
-
βΒ Β βΒ Β βββ MJ_valid
|
| 62 |
-
βΒ Β βββ ST
|
| 63 |
-
βββ WikiText-103.csv
|
| 64 |
-
βββ WikiText-103_eval_d1.csv
|
| 65 |
-
```
|
| 66 |
-
|
| 67 |
-
### Pretrained Models
|
| 68 |
-
|
| 69 |
-
Get the pretrained models from [BaiduNetdisk(passwd:kwck)](https://pan.baidu.com/s/1b3vyvPwvh_75FkPlp87czQ), [GoogleDrive](https://drive.google.com/file/d/1mYM_26qHUom_5NU7iutHneB_KHlLjL5y/view?usp=sharing). Performances of the pretrained models are summaried as follows:
|
| 70 |
-
|
| 71 |
-
|Model|IC13|SVT|IIIT|IC15|SVTP|CUTE|AVG|
|
| 72 |
-
|-|-|-|-|-|-|-|-|
|
| 73 |
-
|ABINet-SV|97.1|92.7|95.2|84.0|86.7|88.5|91.4|
|
| 74 |
-
|ABINet-LV|97.0|93.4|96.4|85.9|89.5|89.2|92.7|
|
| 75 |
-
|
| 76 |
-
## Training
|
| 77 |
-
|
| 78 |
-
1. Pre-train vision model
|
| 79 |
-
```
|
| 80 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml
|
| 81 |
-
```
|
| 82 |
-
2. Pre-train language model
|
| 83 |
-
```
|
| 84 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
|
| 85 |
-
```
|
| 86 |
-
3. Train ABINet
|
| 87 |
-
```
|
| 88 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml
|
| 89 |
-
```
|
| 90 |
-
Note:
|
| 91 |
-
- You can set the `checkpoint` path for vision and language models separately for specific pretrained model, or set to `None` to train from scratch
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
## Evaluation
|
| 95 |
-
|
| 96 |
-
```
|
| 97 |
-
CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_abinet.yaml --phase test --image_only
|
| 98 |
-
```
|
| 99 |
-
Additional flags:
|
| 100 |
-
- `--checkpoint /path/to/checkpoint` set the path of evaluation model
|
| 101 |
-
- `--test_root /path/to/dataset` set the path of evaluation dataset
|
| 102 |
-
- `--model_eval [alignment|vision]` which sub-model to evaluate
|
| 103 |
-
- `--image_only` disable dumping visualization of attention masks
|
| 104 |
-
|
| 105 |
-
## Run Demo
|
| 106 |
-
|
| 107 |
-
```
|
| 108 |
-
python demo.py --config=configs/train_abinet.yaml --input=figs/test
|
| 109 |
-
```
|
| 110 |
-
Additional flags:
|
| 111 |
-
- `--config /path/to/config` set the path of configuration file
|
| 112 |
-
- `--input /path/to/image-directory` set the path of image directory or wildcard path, e.g, `--input='figs/test/*.png'`
|
| 113 |
-
- `--checkpoint /path/to/checkpoint` set the path of trained model
|
| 114 |
-
- `--cuda [-1|0|1|2|3...]` set the cuda id, by default -1 is set and stands for cpu
|
| 115 |
-
- `--model_eval [alignment|vision]` which sub-model to use
|
| 116 |
-
- `--image_only` disable dumping visualization of attention masks
|
| 117 |
-
|
| 118 |
-
## Visualization
|
| 119 |
-
Successful and failure cases on low-quality images:
|
| 120 |
-
|
| 121 |
-

|
| 122 |
-
|
| 123 |
-
## Citation
|
| 124 |
-
If you find our method useful for your reserach, please cite
|
| 125 |
-
```bash
|
| 126 |
-
@article{fang2021read,
|
| 127 |
-
title={Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition},
|
| 128 |
-
author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
|
| 129 |
-
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
| 130 |
-
year={2021}
|
| 131 |
-
}
|
| 132 |
-
```
|
| 133 |
-
|
| 134 |
-
## License
|
| 135 |
-
|
| 136 |
-
This project is only free for academic research purposes, licensed under the 2-clause BSD License - see the LICENSE file for details.
|
| 137 |
-
|
| 138 |
-
Feel free to contact fangsc@ustc.edu.cn if you have any questions.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: ABINet OCR
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: red
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 2.8.12
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|