Inoob
/

DIS-Handwriting-Remover

Model card Files Files and versions

DIS-Handwriting-Remover / README.md

Inoob's picture

Update README.md

cdc04cd verified 12 months ago

|

history blame contribute delete

2.38 kB

	---
	license: apache-2.0
	---
	# Handwriting-Removal-DIS
	My effort into improving handwriting removal throught the new [DIS (Dichotomous Image Segmentation)](https://github.com/xuebinqin/DIS)

	## Inference

	1. Clone the DIS github:

	```cmd
	git clone https://github.com/xuebinqin/DIS
	```

	2. Download the isnet.pth file from this huggingface model repository and move it into the cloned DIS folder.

	3. Replace Inference.py in the cloned DIS folder to the Inference.py of [this repository](https://github.com/ivanhe123/Handwriting-Removal-DIS).

	4. Change the paths according to your own application (Evaluation data path may be different).

	## Related Research
	AndSonder has also done research and experimentaion on the same subject but using deeplabv3+ to segment the handwriting.

	This is a link to his repo: [https://github.com/AndSonder/HandWritingEraser-Pytorch](https://github.com/AndSonder/HandWritingEraser-Pytorch)

	HUGE THANKS to them for providing the segmentation datasets labeled with background blue, printed characters green, and handwriting in red.

	## Dataset
	The original dataset is in Baidu Web Storage and is a segmentation dataset, unlike a background removal dataset.

	Therefore, after some processing, I generated a background-removal dataset. It is available in Huggingface: [https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset](https://huggingface.co/datasets/Inoob/HandwritingSegmentationDataset).

	The relavent contents of the repo is listed:

	```
	\|- train.zip
	\|- val.zip
	```

	After unzipping train.zip and val.zip, the file tree should look like:

	```
	\|-train
	\| \|-gt
	\| \| \|- dehw_train_00714.png
	\| \| \|- dehw_train_00715.png
	\| \| ...
	\| \|-im
	\| \| \|- dehw_train_00714.jpg
	\| \| \|- dehw_train_00715.jpg
	\|-val
	\| \|-gt
	\| \| \|- dehw_train_00000.png
	\| \| \|- dehw_train_00001.png
	\| \| ...
	\| \|-im
	\| \| \|- dehw_train_00000.png
	\| \| \|- dehw_train_00001.png
	```

	the ```gt``` folder is masks. With the background masked in black, and the handwriting masked as white (a.k.a ground truth data).

	the ```im``` folder is the normal image of the handwriting dataset.

	The code that was used to generate the dataset in the Huggingface Repo is ```create_masks.py```

	## Training

	I used the ```train_valid_inference_main.py``` from [DIS](https://github.com/xuebinqin/DIS) with my own dataset and training batch size.