kaixkhazaki
/

vit_doclaynet_base

Image Classification

document-layout-analysis

document-classification

Model card Files Files and versions

vit_doclaynet_base / README.md

kaixkhazaki's picture

Update README.md

48d5888 verified about 1 year ago

|

history blame contribute delete

2.2 kB

	---
	datasets:
	- pierreguillou/DocLayNet-base
	metrics:
	- accuracy
	base_model:
	- google/vit-base-patch16-224-in21k
	library_name: transformers
	tags:
	- vision
	- document-layout-analysis
	- document-classification
	- vit
	- doclaynet
	---
	# Vision Transformer(ViT) for Document Classification(DocLayNet)

	This model is a fine-tuned Vision Transformer (ViT) for document layout classification based on the DocLayNet dataset.

	Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :

	```python
	{'financial_reports': 0,
	'government_tenders': 1,
	'laws_and_regulations': 2,
	'manuals': 3,
	'patents': 4,
	'scientific_articles': 5}

	```

	## Model description

	This model is built upon the `google/vit-base-patch16-224-in21k` Vision Transformer architecture and fine-tuned specifically for document layout classification. The base ViT model uses a patch size of 16x16 pixels and was pre-trained on ImageNet-21k. The model has been optimized to recognize and classify different types of document layouts from the DocLayNet dataset.

	## Training data

	The model was trained on DocLayNet-base dataset, which is available on the Hugging Face Hub: [pierreguillou/DocLayNet-base](https://huggingface.co/datasets/pierreguillou/DocLayNet-base)

	DocLayNet is a comprehensive dataset for document layout analysis, containing various document types and their corresponding layout annotations.

	## Training procedure

	Trained for 10 epochs on a single gpu for ~10 mins.

	The training hyperparameters:

	```python
	{
	'batch_size': 64,
	'num_epochs': 20,
	'learning_rate': 1e-4,
	'weight_decay': 0.05,
	'warmup_ratio': 0.2,
	'gradient_clip': 0.1,
	'dropout_rate': 0.1,
	'label_smoothing': 0.1,
	'optimizer': 'AdamW'
	}

	```

	## Evaluation results
	The model achieved the following performance metrics on the test set:

	Test Loss: 0.8622
	Test Accuracy: 81.36%



	## Usage


	```python
	from transformers import pipeline

	# Load the model using the image-classification pipeline
	pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base")

	# Test it with an image
	result = pipe("path_to_image.jpg")
	print(result)

	```