| | --- |
| | license: eupl-1.2 |
| | language: |
| | - en |
| | metrics: |
| | - type: f1 |
| | value: 0.8345 |
| | name: micro F1 |
| | args: |
| | threshold: 0.46 |
| | - type: NDCG@3 |
| | value: 0.8819 |
| | name: NDCG@5 |
| | - type: NDCG@5 |
| | value: 0.8689 |
| | name: NDCG@5 |
| | - type: NDCG@10 |
| | value: 0.8780 |
| | name: NDCG@10 |
| | tags: |
| | - eurovoc |
| | pipeline_tag: text-classification |
| |
|
| | widget: |
| | - text: "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities." |
| | |
| | --- |
| | |
| | # Eurovoc Multilabel Classifer 🇪🇺 |
| |
|
| | [EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual (24 languages of 🇪🇺) hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions. |
| | Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain. |
| |
|
| | This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint. |
| | This model support the 24 languages of the European Union. |
| |
|
| | ## Architecture |
| |
|
| |  |
| |
|
| | This classification model is built on top of [EUBERT](https://huggingface.co/EuropeanParliament/EUBERT) with 7331 Eurovoc labels |
| |
|
| | With less than 100 million parameters, it can be deployed on commodity hardware without GPU acceleration (around 200 ms per inference for 2000 characters). |
| |
|
| | Parameters : |
| | - Number of epochs 16 |
| | - Batch size 10 |
| | - Max lenght 512 |
| | - Learning Rate 5e-05 |
| |
|
| | ## Usage |
| |
|
| |
|
| | ```python |
| | from eurovoc import EurovocTagger |
| | model = EurovocTagger.from_pretrained("EuropeanParliament/eurovoc_eu") |
| | ``` |
| | see the source code also |
| |
|
| | ## Author(s) |
| |
|
| | Sébastien Campion <sebastien.campion@europarl.europa.eu> |
| |
|
| | Andreas Papagiannis <andreas.papagiannis@europarl.europa.eu> |
| |
|