kaixkhazaki
/

vit_doclaynet_base

Image Classification

document-layout-analysis

document-classification

Model card Files Files and versions

kaixkhazaki commited on Jan 4, 2025

Commit

f92c1d8

·

verified ·

1 Parent(s): f3e9c37

Create README.md

Files changed (1) hide show

README.md +37 -0

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# ViT Model for Document Layout Classification
+This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.
+## Model description
+This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.
+## Training data
+The model was trained on DocLayNet-base dataset, which is available on the Hugging Face Hub: [pierreguillou/DocLayNet-base](https://huggingface.co/datasets/pierreguillou/DocLayNet-base)
+DocLayNet is a comprehensive dataset for document layout analysis, containing various document types and their corresponding layout annotations.
+## Training procedure
+The training was made with following hyperparameters:
+```python
+{
+    'batch_size': 64,
+    'num_epochs': 20,
+    'learning_rate': 1e-4,
+    'weight_decay': 0.05,
+    'warmup_ratio': 0.2,
+    'gradient_clip': 0.1,
+    'dropout_rate': 0.1,
+    'label_smoothing': 0.1,
+    'optimizer': 'AdamW'
+}
+## Evaluation results
+The model achieved the following performance metrics on the test set:
+Test Loss: 0.8622
+Test Accuracy: 81.36%