Inoob commited on
Commit
a6b7887
·
verified ·
1 Parent(s): d447552

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Handwriting-Removal-DIS
5
+ My effort into improving handwriting removal throught the new DIS (Dichotomous Image Segmentation)
6
+
7
+ ## Related Research
8
+ AndSonder has also done research and experimentaion on the same subject but using deeplabv3+ to segment the handwriting.
9
+
10
+ This is a link to his repo: [https://github.com/AndSonder/HandWritingEraser-Pytorch](https://github.com/AndSonder/HandWritingEraser-Pytorch)
11
+
12
+ HUGE THANKS to them for providing the segmentation datasets labeled with background blue, printed characters green, and handwriting in red.
13
+
14
+ ## Dataset
15
+ The original dataset is in Baidu Web Storage and is a segmentation dataset, unlike a background removal dataset.
16
+
17
+ Therefore, after some processing, I generated a background-removal dataset. It is available in Huggingface: [https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset](https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset).
18
+
19
+ The relavent contents of the repo is listed:
20
+
21
+ ```
22
+ |- train.zip
23
+ |- val.zip
24
+ ```
25
+
26
+ After unzipping train.zip and val.zip, the file tree should look like:
27
+
28
+ ```
29
+ |-train
30
+ | |-gt
31
+ | | |- dehw_train_00714.png
32
+ | | |- dehw_train_00715.png
33
+ | | ...
34
+ | |-im
35
+ | | |- dehw_train_00714.jpg
36
+ | | |- dehw_train_00715.jpg
37
+ |-val
38
+ | |-gt
39
+ | | |- dehw_train_00000.png
40
+ | | |- dehw_train_00001.png
41
+ | | ...
42
+ | |-im
43
+ | | |- dehw_train_00000.png
44
+ | | |- dehw_train_00001.png
45
+ ```
46
+
47
+ the ```gt``` folder is masks. With the background masked in black, and the handwriting masked as white (a.k.a ground truth data).
48
+
49
+ the ```im``` folder is the normal image of the handwriting dataset.
50
+
51
+ The code that was used to generate the dataset in the Huggingface Repo is ```create_masks.py```
52
+
53
+ ## Training
54
+
55
+ I used the ```train_valid_inference_main.py``` from [DIS](https://github.com/xuebinqin/DIS) with my own dataset and training batch size.
56
+
57
+ You can scale the batch size up if you have enough memory.