FinDoctype for document type classification

Introduction

FinDoctype is a document type classification model trained with digitized and freely available archival data collected from the National Archives of Finland (Astia). The model is based on Vision Transformer (pretrained on vit_base_patch16_384).

Classes

The model classifies images into one of the following seven classes:

  1. Cover page (kansilehti)
  2. Card index (kortti)
  3. Map (kartta)
  4. Picture (kuva)
  5. Running text - e.g., contracts, reports, memos, or letters (juokseva teksti, kuten sopimukset, raportit, pöytäkirjat tai kirjeet)
  6. Newspaper (lehtileike)
  7. Table or form (taulukko tai lomake)

About the training data

We used 14,800 images for training, 3,700 for validation and 4,624 for testing. The full list of datasets used for training will be released later.

Limitations

Note that this is a work in progress and the model may not work well with all types of data. When extracting a testing set (n=4,624) similar to the training data, we obtain (top-1) accuracy of 93.5%. With another testing set (n=44), that has data dissimilar to the training data, we obtain top-1 accuracy of 45.5% and top-2 accuracy 65.9%.

We are working on incorporating other modes (text) to the modelling work, researching ways to make the classification of content more accurate.

Using the model

import timm
import torch
from PIL import Image
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform

# Load from Hub 🔥
model = timm.create_model(
    'hf-hub:jyu-digihum/findoctype',
    pretrained=True
)

# Set model to eval mode for inference
model.eval()

# Create Transform
transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model))

# Get the labels from the model config
labels = model.pretrained_cfg['label_names']
top_k = min(len(labels), 5)

# Use your own image file here...
image = Image.open('image.jpg').convert('RGB')

# Process PIL image with transforms and add a batch dimension
x = transform(image).unsqueeze(0)

# Pass inputs to model forward function to get outputs
out = model(x)

# Apply softmax to get predicted probabilities for each class
probabilities = torch.nn.functional.softmax(out[0], dim=0)

# Grab the values and indices of top 5 predicted classes
values, indices = torch.topk(probabilities, top_k)

# Prepare a nice dict of top k predictions
predictions = [
    {"label": labels[i], "score": v.item()}
    for i, v in zip(indices, values)
]
print(predictions)

Acknowledgements

This repository has been produced as part of the FIN-CLARIAH infrastructure project and in cooperation with the National Archives of Finland.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support