kaixkhazaki commited on
Commit
f92c1d8
·
verified ·
1 Parent(s): f3e9c37

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ViT Model for Document Layout Classification
2
+
3
+ This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.
4
+
5
+ ## Model description
6
+
7
+ This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.
8
+
9
+ ## Training data
10
+
11
+ The model was trained on DocLayNet-base dataset, which is available on the Hugging Face Hub: [pierreguillou/DocLayNet-base](https://huggingface.co/datasets/pierreguillou/DocLayNet-base)
12
+
13
+ DocLayNet is a comprehensive dataset for document layout analysis, containing various document types and their corresponding layout annotations.
14
+
15
+ ## Training procedure
16
+
17
+ The training was made with following hyperparameters:
18
+
19
+ ```python
20
+ {
21
+ 'batch_size': 64,
22
+ 'num_epochs': 20,
23
+ 'learning_rate': 1e-4,
24
+ 'weight_decay': 0.05,
25
+ 'warmup_ratio': 0.2,
26
+ 'gradient_clip': 0.1,
27
+ 'dropout_rate': 0.1,
28
+ 'label_smoothing': 0.1,
29
+ 'optimizer': 'AdamW'
30
+ }
31
+
32
+
33
+ ## Evaluation results
34
+ The model achieved the following performance metrics on the test set:
35
+
36
+ Test Loss: 0.8622
37
+ Test Accuracy: 81.36%