kaixkhazaki
/

multilingual-e5-doclaynet

Text Classification

document-classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

multilingual-e5-doclaynet / README.md

kaixkhazaki's picture

Update README.md

c99d746 verified about 1 year ago

|

history blame contribute delete

1.94 kB

	---
	language: multilingual
	tags:
	- document-classification
	- text-classification
	- multilingual
	- doclaynet
	- e5
	pipeline_tag: text-classification
	base_model: intfloat/multilingual-e5-large
	datasets:
	- pierreguillou/DocLayNet-base
	metrics:
	- accuracy
	model-index:
	- name: multilingual-e5-doclaynet
	results:
	- task:
	type: text-classification
	name: Document Classification
	dataset:
	name: DocLayNet
	type: pierreguillou/DocLayNet-base
	metrics:
	- type: accuracy
	value: 0.9719
	name: Test Accuracy
	- type: loss
	value: 0.5192
	name: Test Loss
	library_name: transformers
	---
	# Multilingual E5 for Document Classification (DocLayNet)
	This model is a fine-tuned version of intfloat/multilingual-e5-large for document text classification based on the DocLayNet dataset.

	## Evaluation results

	- Test Loss: 0.5192, Test Acc: 0.9719

	## Usage:

	```python

	# Use a pipeline as a high-level helper
	from transformers import pipeline

	pipe = pipeline("text-classification", model="kaixkhazaki/multilingual-e5-doclaynet")

	prediction = pipe("This is some text from a financial report")
	print(prediction)
	```

	## Model description
	- Base model: intfloat/multilingual-e5-large
	- Task: Document text classification
	- Languages: Multilingual

	## Training data
	- Dataset: DocLayNet-base
	- Source: https://huggingface.co/datasets/pierreguillou/DocLayNet-base
	- Categories:
	```python
	{
	'financial_reports': 0,
	'government_tenders': 1,
	'laws_and_regulations': 2,
	'manuals': 3,
	'patents': 4,
	'scientific_articles': 5
	}
	```
	## Training procedure

	Trained on single gpu for 2 epochs for apx. 20 minutes.

	hyperparameters:
	```python
	{
	'batch_size': 8,
	'num_epochs': 10,
	'learning_rate': 2e-5,
	'weight_decay': 0.01,
	'warmup_ratio': 0.1,
	'gradient_clip': 1.0,
	'label_smoothing': 0.1,
	'optimizer': 'AdamW',
	'scheduler': 'cosine_with_warmup'
	}
	```