Spaces:
Running
Running
File size: 4,010 Bytes
be53a00 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# Developer Guide
## Project design
The design of Xournal++ HTR tries to bridge the gap between both delivering a production ready product and allowing contributors to experiment with new algorithms.
The project design involves a Lua plugin and a Python backend, see the following figure. First, the production ready product is delivered by means of an Xournal++ plugin. The plugin is fully integrated in Xournal++ and calls a Python backend that performs the actual transcription. The Python backend allows selection of various recognition models and is thereby fully extendable with new models.
<!--
DOESN'T WORK SOMEHOW:
<div align="center">
<img src="images/system_design.jpg" width="50%">
<p><i>Design of xournalpp_htr.</i></p>
</div>
-->
<!-- An alternative figure is shown below: -->
```mermaid
sequenceDiagram
User in Xpp-->>Xpp HTR Plugin: starts transcription process using currently open file
Xpp HTR Plugin -->> Xpp HTR Lua Plugin: calls
Xpp HTR Lua Plugin -->>Xpp HTR Python Backend: constructs command using CLI
Xpp HTR Python Backend -->> Xpp HTR Python Backend: Does OCR & stores PDF
Xpp HTR Python Backend-->>User in Xpp: Gives back control to UI
```
Developing a usable HTR systems requires experimentation. The project structure is set up to accommodate this need. *Note that ideas on improved project structures are appreciated.*
The experimentation is carried out in terms of "concepts". Each concept explores a different approach to HTR and possibly improves over previous concepts, but not necessarily to allow for freedom in risky experiments. Concept 1 is already implemented and uses a computer vision approach that is explained below.
Future concepts might explore:
- Retrain computer vision models from concept 1 using native online data representation of [Xournal++](https://github.com/xournalpp/xournalpp)
- Use sequence-to-sequence models to take advantage of native online data representation of [Xournal++](https://github.com/xournalpp/xournalpp); e.g. use [OnlineHTR](https://github.com/PellelNitram/OnlineHTR)
- Use data augmentation to increase effective size of training data
- Use of language models to correct for spelling mistakes
### Concept 1
This concept uses computer vision based algorithms to first detect words on a page and then to read those words.
The following shows a video demo on YouTube using real-life handwriting data from a Xournal file:
[](https://www.youtube.com/watch?v=FGD_O8brGNY)
Despite not being perfect, the main take away is that the performance is surprisingly good given that the underlying algorithm has not been optimised for Xournal++ data at all.
**The performance is sufficiently good to be useful for the Xournal++ user base.**
Feel free to play around with the demo yourself using [this code](https://github.com/PellelNitram/xournalpp_htr/blob/master/scripts/demo_concept_1.sh) after [installing this project](installation_user.md). The "concept 1" is also what is currently used in the plugin and shown in the [90 seconds demo](https://www.youtube.com/watch?v=boXm7lPFSRQ).
Next steps to improve the performance of the handwritten text recognition even further could be:
- Re-train the algorithm on Xournal++ specific data, while potentially using data augmentation.
- Use language model to improve text encoding.
- Use sequence-to-sequence algorithm that makes use of [Xournal++](https://github.com/xournalpp/xournalpp)'s data format. This translates into using online HTR algorithms.
I would like to acknowledge [Harald Scheidl](https://github.com/githubharald) in this concept as he wrote the underlying algorithms and made them easily usable through [his HTRPipeline repository](https://github.com/githubharald/HTRPipeline) - after all I just feed his algorithm [Xournal++](https://github.com/xournalpp/xournalpp) data in concept 1. [Go check out his great content](https://githubharald.github.io/)! |