|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# Handwriting-Removal-DIS |
|
|
My effort into improving handwriting removal throught the new [DIS (Dichotomous Image Segmentation)](https://github.com/xuebinqin/DIS) |
|
|
|
|
|
## Inference |
|
|
|
|
|
1. Clone the DIS github: |
|
|
|
|
|
```cmd |
|
|
git clone https://github.com/xuebinqin/DIS |
|
|
``` |
|
|
|
|
|
2. Download the isnet.pth file from this huggingface model repository and move it into the cloned DIS folder. |
|
|
|
|
|
3. Replace Inference.py in the cloned DIS folder to the Inference.py of [this repository](https://github.com/ivanhe123/Handwriting-Removal-DIS). |
|
|
|
|
|
4. Change the paths according to your own application (Evaluation data path may be different). |
|
|
|
|
|
## Related Research |
|
|
AndSonder has also done research and experimentaion on the same subject but using deeplabv3+ to segment the handwriting. |
|
|
|
|
|
This is a link to his repo: [https://github.com/AndSonder/HandWritingEraser-Pytorch](https://github.com/AndSonder/HandWritingEraser-Pytorch) |
|
|
|
|
|
HUGE THANKS to them for providing the segmentation datasets labeled with background blue, printed characters green, and handwriting in red. |
|
|
|
|
|
## Dataset |
|
|
The original dataset is in Baidu Web Storage and is a segmentation dataset, unlike a background removal dataset. |
|
|
|
|
|
Therefore, after some processing, I generated a background-removal dataset. It is available in Huggingface: [https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset](https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset). |
|
|
|
|
|
The relavent contents of the repo is listed: |
|
|
|
|
|
``` |
|
|
|- train.zip |
|
|
|- val.zip |
|
|
``` |
|
|
|
|
|
After unzipping train.zip and val.zip, the file tree should look like: |
|
|
|
|
|
``` |
|
|
|-train |
|
|
| |-gt |
|
|
| | |- dehw_train_00714.png |
|
|
| | |- dehw_train_00715.png |
|
|
| | ... |
|
|
| |-im |
|
|
| | |- dehw_train_00714.jpg |
|
|
| | |- dehw_train_00715.jpg |
|
|
|-val |
|
|
| |-gt |
|
|
| | |- dehw_train_00000.png |
|
|
| | |- dehw_train_00001.png |
|
|
| | ... |
|
|
| |-im |
|
|
| | |- dehw_train_00000.png |
|
|
| | |- dehw_train_00001.png |
|
|
``` |
|
|
|
|
|
the ```gt``` folder is masks. With the background masked in black, and the handwriting masked as white (a.k.a ground truth data). |
|
|
|
|
|
the ```im``` folder is the normal image of the handwriting dataset. |
|
|
|
|
|
The code that was used to generate the dataset in the Huggingface Repo is ```create_masks.py``` |
|
|
|
|
|
## Training |
|
|
|
|
|
I used the ```train_valid_inference_main.py``` from [DIS](https://github.com/xuebinqin/DIS) with my own dataset and training batch size. |
|
|
|