LLM4SCIENCE
/

locr_alpha

Model card Files Files and versions

Tianning commited on Sep 11, 2024

Commit

63f47b2

·

verified ·

1 Parent(s): 53a006b

Create README.md

Files changed (1) hide show

README.md +23 -0

README.md ADDED Viewed

	@@ -0,0 +1,23 @@

+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{}
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+This project aims to create a text scanner that converts paper images into machine-readable formats (e.g., Markdown, JSON). It is the son of Nougat, and thus, grandson of Douat.
+The key idea is to combine the bounding box modality with text, achieving a pixel scan behavior that predicts not only the next token but also the next position.
+![Example Image](https://raw.githubusercontent.com/veya2ztn/Lougat/main/images/image.png)
+The name "Lougat" is a combination of LLama and Nougat. In this repo, you'll also find other combinations like:
+- Florence2 + LLama → Flougat
+- Sam2 + LLama → Slougat
+- Nougat + Relative Position Embedding LLama → Rlougat
+The key idea is nature continues of this paper [LOCR: Location-Guided Transformer for Optical Character Recognition]([[2403.02127\] LOCR: Location-Guided Transformer for Optical Character Recognition (arxiv.org)](https://arxiv.org/abs/2403.02127))