Holako
/

NER_model_holako

Token Classification

Transformers

PyTorch

xlm-roberta

Model card Files Files and versions

xet

Community

sami commited on Feb 23, 2022

Commit

fe703e7

2 Parent(s): ac49e62 4c72cc5

updated

Browse files

Files changed (1) hide show

README.md +4 -45

README.md CHANGED Viewed

@@ -1,24 +1,4 @@
-Hugging Face's logo
----
-language:
-- ar
-- de
-- en
-- es
-- fr
-- it
-- lv
-- nl
-- pt
-- zh
-- multilingual
----
-# xlm-roberta-large-ner-hrl
-## Model description
-**xlm-roberta-large-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned XLM-RoBERTa large model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER).
-Specifically, this model is a *xlm-roberta-large* model that was fine-tuned on an aggregation of 10 high-resourced languages
-## Intended uses & limitations
 #### How to use
 You can use this model with Transformers *pipeline* for NER.
 ```python
@@ -33,33 +13,12 @@ print(ner_results)
 ```
 #### Limitations and bias
 This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
 ## Training data
-The training data for the 10 languages are from:
 Language|Dataset
 -|-
 Arabic | [ANERcorp](https://camel.abudhabi.nyu.edu/anercorp/)
-German | [conll 2003](https://www.clips.uantwerpen.be/conll2003/ner/)
-English | [conll 2003](https://www.clips.uantwerpen.be/conll2003/ner/)
-Spanish | [conll 2002](https://www.clips.uantwerpen.be/conll2002/ner/)
-French | [Europeana Newspapers](https://github.com/EuropeanaNewspapers/ner-corpora/tree/master/enp_FR.bnf.bio)
-Italian | [Italian I-CAB](https://ontotext.fbk.eu/icab.html)
-Latvian | [Latvian NER](https://github.com/LUMII-AILab/FullStack/tree/master/NamedEntities)
-Dutch | [conll 2002](https://www.clips.uantwerpen.be/conll2002/ner/)
-Portuguese |[Paramopama + Second Harem](https://github.com/davidsbatista/NER-datasets/tree/master/Portuguese)
-Chinese | [MSRA](https://huggingface.co/datasets/msra_ner)
-The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:
-Abbreviation|Description
--|-
-O|Outside of a named entity
-B-PER |Beginning of a person’s name right after another person’s name
-I-PER |Person’s name
-B-ORG |Beginning of an organisation right after another organisation
-I-ORG |Organisation
-B-LOC |Beginning of a location right after another location
-I-LOC |Location
-## Training procedure
-This model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.

 #### How to use
 You can use this model with Transformers *pipeline* for NER.
 ```python
 ```
 #### Limitations and bias
 This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
+=======
+#### Limitations and bias
+This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
 ## Training data
 Language|Dataset
 -|-
 Arabic | [ANERcorp](https://camel.abudhabi.nyu.edu/anercorp/)