Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,8 @@ language:
|
|
| 6 |
metrics:
|
| 7 |
- cer
|
| 8 |
pipeline_tag: image-to-text
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
# Model description
|
| 11 |
|
|
@@ -21,7 +23,7 @@ pipeline_tag: image-to-text
|
|
| 21 |
|
| 22 |
**License:** Apache 2.0
|
| 23 |
|
| 24 |
-
This model is a fine-tuned version of the microsoft/trocr-large-handwritten model, specialized for recognizing handwritten text. It has been trained on various dataset from
|
| 25 |
|
| 26 |
# Model Architecture
|
| 27 |
|
|
@@ -39,15 +41,15 @@ This model is designed for handwritten text recognition and is intended for use
|
|
| 39 |
|
| 40 |
# Training data
|
| 41 |
|
| 42 |
-
The training
|
| 43 |
|
| 44 |
# Evaluation
|
| 45 |
|
| 46 |
The model was evaluated on test dataset. Below are key metrics:
|
| 47 |
|
| 48 |
-
**Character Error Rate (CER):**
|
| 49 |
|
| 50 |
-
**Test Dataset Description:** size ~
|
| 51 |
|
| 52 |
# Used Hyperparameters
|
| 53 |
|
|
@@ -55,11 +57,9 @@ The model was evaluated on test dataset. Below are key metrics:
|
|
| 55 |
|
| 56 |
**Train batch size per device:** 16
|
| 57 |
|
| 58 |
-
**Learning rate:**
|
| 59 |
|
| 60 |
-
**Scheduler:**
|
| 61 |
-
|
| 62 |
-
**Warmup steps:** 500
|
| 63 |
|
| 64 |
**Optimizer:** AdamW
|
| 65 |
|
|
@@ -69,6 +69,8 @@ The model was evaluated on test dataset. Below are key metrics:
|
|
| 69 |
|
| 70 |
**Half precision backend:** cuda_amp
|
| 71 |
|
|
|
|
|
|
|
| 72 |
|
| 73 |
# How to Use the Model
|
| 74 |
|
|
@@ -110,13 +112,13 @@ Potential improvements for this model include:
|
|
| 110 |
|
| 111 |
If you use this model in your work, please cite it as:
|
| 112 |
|
| 113 |
-
@misc{
|
| 114 |
|
| 115 |
author = {Kansallisarkisto},
|
| 116 |
|
| 117 |
title = {Multicentury HTR Model: Handwritten Text Recognition},
|
| 118 |
|
| 119 |
-
year = {
|
| 120 |
|
| 121 |
publisher = {Hugging Face},
|
| 122 |
|
|
@@ -127,4 +129,4 @@ If you use this model in your work, please cite it as:
|
|
| 127 |
## Model Card Authors
|
| 128 |
|
| 129 |
Author: Kansallisarkisto
|
| 130 |
-
Contact Information:
|
|
|
|
| 6 |
metrics:
|
| 7 |
- cer
|
| 8 |
pipeline_tag: image-to-text
|
| 9 |
+
base_model:
|
| 10 |
+
- microsoft/trocr-large-handwritten
|
| 11 |
---
|
| 12 |
# Model description
|
| 13 |
|
|
|
|
| 23 |
|
| 24 |
**License:** Apache 2.0
|
| 25 |
|
| 26 |
+
This model is a fine-tuned version of the microsoft/trocr-large-handwritten model, specialized for recognizing handwritten text. It has been trained on various dataset from 16th to 20th centuries and can be used for applications such as document digitization, form recognition, or any task involving handwritten text extraction.
|
| 27 |
|
| 28 |
# Model Architecture
|
| 29 |
|
|
|
|
| 41 |
|
| 42 |
# Training data
|
| 43 |
|
| 44 |
+
The training dataset includes more than 913 000 samples of handwritten and typewritten text rows, covering a wide variety of handwriting styles and text samples.
|
| 45 |
|
| 46 |
# Evaluation
|
| 47 |
|
| 48 |
The model was evaluated on test dataset. Below are key metrics:
|
| 49 |
|
| 50 |
+
**Character Error Rate (CER):** 2.8
|
| 51 |
|
| 52 |
+
**Test Dataset Description:** size ~111 800 text rows
|
| 53 |
|
| 54 |
# Used Hyperparameters
|
| 55 |
|
|
|
|
| 57 |
|
| 58 |
**Train batch size per device:** 16
|
| 59 |
|
| 60 |
+
**Learning rate:** 12.2e-5
|
| 61 |
|
| 62 |
+
**Scheduler:** polynomial
|
|
|
|
|
|
|
| 63 |
|
| 64 |
**Optimizer:** AdamW
|
| 65 |
|
|
|
|
| 69 |
|
| 70 |
**Half precision backend:** cuda_amp
|
| 71 |
|
| 72 |
+
**Input image size:** 192 x 1024
|
| 73 |
+
|
| 74 |
|
| 75 |
# How to Use the Model
|
| 76 |
|
|
|
|
| 112 |
|
| 113 |
If you use this model in your work, please cite it as:
|
| 114 |
|
| 115 |
+
@misc{multicentury_htr_model_202509,
|
| 116 |
|
| 117 |
author = {Kansallisarkisto},
|
| 118 |
|
| 119 |
title = {Multicentury HTR Model: Handwritten Text Recognition},
|
| 120 |
|
| 121 |
+
year = {2025},
|
| 122 |
|
| 123 |
publisher = {Hugging Face},
|
| 124 |
|
|
|
|
| 129 |
## Model Card Authors
|
| 130 |
|
| 131 |
Author: Kansallisarkisto
|
| 132 |
+
Contact Information: mikko.lipsanen@kansallisarkisto.fi, ilkka.jokipii@kansallisarkisto.fi
|