TrOCR Kurrent-Model 19th century

Handwritten Text Recognition model for 19th century German.

Part of the developments at the Digital Humanities@University of Bern. Developed by Jonas Widmer and Tobias Hodel in conjunction with researchers and institutions mentioned below.

Base model: microsoft/trocr-base-handwritten

Train Lines: 292'997
Eval Lines: 7'513
Test Lines: 15'817

Epochs: 19.66 / 20
Eval CER: 0.02827
Test CER: 0.02655

Finetuned on Kurrent-dataset, containing:

Material from the State Archives of Zurich ("Regierungsratsprotokolle"), provided by the State Archives of Zurich
Lecture notes of Humboldt Lectures, provided by the Berlin-Brandenburgian Academy of Sciences
Diary of Eugen Huber, provided by the University of Zurich
Handwriting and Copies by and of Gottfried Semper (provided by the respective research project at ETH Zürich and USI Mendrisio)
Konzilsprotokolle, University of Greifswald (19th century)
as well as many other smaller collections/examples

The model has not been extensively tested. Potential biases are still to be identified.

Downloads last month: 1,600

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dh-unibe/trocr-kurrent

Base model

microsoft/trocr-base-handwritten

Finetuned

(38)

this model

dh-unibe
/

trocr-kurrent

TrOCR Kurrent-Model 19th century

Model tree for dh-unibe/trocr-kurrent

Space using dh-unibe/trocr-kurrent 1