Spaces:
Running
Running
| # Developer Guide | |
| ## Project design | |
| The design of Xournal++ HTR tries to bridge the gap between both delivering a production ready product and allowing contributors to experiment with new algorithms. | |
| The project design involves a Lua plugin and a Python backend, see the following figure. First, the production ready product is delivered by means of an Xournal++ plugin. The plugin is fully integrated in Xournal++ and calls a Python backend that performs the actual transcription. The Python backend allows selection of various recognition models and is thereby fully extendable with new models. | |
| <!-- | |
| DOESN'T WORK SOMEHOW: | |
| <div align="center"> | |
| <img src="images/system_design.jpg" width="50%"> | |
| <p><i>Design of xournalpp_htr.</i></p> | |
| </div> | |
| --> | |
| <!-- An alternative figure is shown below: --> | |
| ```mermaid | |
| sequenceDiagram | |
| User in Xpp-->>Xpp HTR Plugin: starts transcription process using currently open file | |
| Xpp HTR Plugin -->> Xpp HTR Lua Plugin: calls | |
| Xpp HTR Lua Plugin -->>Xpp HTR Python Backend: constructs command using CLI | |
| Xpp HTR Python Backend -->> Xpp HTR Python Backend: Does OCR & stores PDF | |
| Xpp HTR Python Backend-->>User in Xpp: Gives back control to UI | |
| ``` | |
| Developing a usable HTR systems requires experimentation. The project structure is set up to accommodate this need. *Note that ideas on improved project structures are appreciated.* | |
| The experimentation is carried out in terms of "concepts". Each concept explores a different approach to HTR and possibly improves over previous concepts, but not necessarily to allow for freedom in risky experiments. Concept 1 is already implemented and uses a computer vision approach that is explained below. | |
| Future concepts might explore: | |
| - Retrain computer vision models from concept 1 using native online data representation of [Xournal++](https://github.com/xournalpp/xournalpp) | |
| - Use sequence-to-sequence models to take advantage of native online data representation of [Xournal++](https://github.com/xournalpp/xournalpp); e.g. use [OnlineHTR](https://github.com/PellelNitram/OnlineHTR) | |
| - Use data augmentation to increase effective size of training data | |
| - Use of language models to correct for spelling mistakes | |
| ### Concept 1 | |
| This concept uses computer vision based algorithms to first detect words on a page and then to read those words. | |
| The following shows a video demo on YouTube using real-life handwriting data from a Xournal file: | |
| [](https://www.youtube.com/watch?v=FGD_O8brGNY) | |
| Despite not being perfect, the main take away is that the performance is surprisingly good given that the underlying algorithm has not been optimised for Xournal++ data at all. | |
| **The performance is sufficiently good to be useful for the Xournal++ user base.** | |
| Feel free to play around with the demo yourself using [this code](https://github.com/PellelNitram/xournalpp_htr/blob/master/scripts/demo_concept_1.sh) after [installing this project](installation_user.md). The "concept 1" is also what is currently used in the plugin and shown in the [90 seconds demo](https://www.youtube.com/watch?v=boXm7lPFSRQ). | |
| Next steps to improve the performance of the handwritten text recognition even further could be: | |
| - Re-train the algorithm on Xournal++ specific data, while potentially using data augmentation. | |
| - Use language model to improve text encoding. | |
| - Use sequence-to-sequence algorithm that makes use of [Xournal++](https://github.com/xournalpp/xournalpp)'s data format. This translates into using online HTR algorithms. | |
| I would like to acknowledge [Harald Scheidl](https://github.com/githubharald) in this concept as he wrote the underlying algorithms and made them easily usable through [his HTRPipeline repository](https://github.com/githubharald/HTRPipeline) - after all I just feed his algorithm [Xournal++](https://github.com/xournalpp/xournalpp) data in concept 1. [Go check out his great content](https://githubharald.github.io/)! |