File size: 2,161 Bytes
3d3ffd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: eupl-1.2
language:
- en
metrics:
  - type: f1         
    value: 0.8345 
    name: micro F1
    args:
      threshold: 0.46
  - type: NDCG@3         
    value: 0.8819  
    name: NDCG@5
  - type: NDCG@5         
    value: 0.8689 
    name: NDCG@5
  - type: NDCG@10         
    value: 0.8780 
    name: NDCG@10
tags:
- eurovoc
pipeline_tag: text-classification

widget:
- text: "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities."
 
---

# Eurovoc Multilabel Classifer 🇪🇺

[EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual (24 languages of 🇪🇺) hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions.
Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain.

This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint.
This model support the 24 languages of the European Union.

## Architecture

![architecture](architecture.png)

This classification model is built on top of [EUBERT](https://huggingface.co/EuropeanParliament/EUBERT) with 7331 Eurovoc labels

With less than 100 million parameters, it can be deployed on commodity hardware without GPU acceleration (around 200 ms per inference for 2000 characters).

Parameters :
- Number of epochs 16
- Batch size  10
- Max lenght 512
- Learning Rate 5e-05

## Usage


```python
from eurovoc import EurovocTagger
model = EurovocTagger.from_pretrained("EuropeanParliament/eurovoc_eu")
```
see the source code also

## Author(s)

Sébastien Campion <sebastien.campion@europarl.europa.eu> 

Andreas Papagiannis <andreas.papagiannis@europarl.europa.eu>