|
|
"description": "--- annotations_creators: - machine-generated language_creators: - machine-generated widget: - text: My name is Wolfgang and I live in Berlin. - text: George Washington went to Washington. - text: Mi nombre es Sarah y vivo en Londres. - text: Меня зовут Симона, и я живу в Риме. tags: - named-entity-recognition - sequence-tagger-model datasets: - Babelscape/wikineural language: - de - en - es - fr - it - nl - pl - pt - ru - multilingual license: - cc-by-nc-sa-4.0 pretty_name: wikineural-dataset source_datasets: - original task_categories: - structure-prediction task_ids: - named-entity-recognition --- # WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER This is the model card for the EMNLP 2021 paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. We fine-tuned a multilingual language model (mBERT) for 3 epochs on our WikiNEuRal dataset for Named Entity Recognition (NER). The resulting multilingual NER model supports the 9 languages covered by WikiNEuRal (de, en, es, fr, it, nl, pl, pt, ru), and it was trained on all 9 languages jointly. **If you use the model, please reference this work in your paper**: The original repository for the paper can be found at ## How to use You can use this model with Transformers *pipeline* for NER. ## Limitations and bias This model is trained on WikiNEuRal, a state-of-the-art dataset for Multilingual NER automatically derived from Wikipedia. Therefore, it might not generalize well to all textual genres (e.g. news). On the other hand, models trained only on news articles (e.g. only on CoNLL03) have been proven to obtain much lower scores on encyclopedic articles. To obtain more robust systems, we encourage you to train a system on the combination of WikiNEuRal with other datasets (e.g. WikiNEuRal + CoNLL). ## Licensing Information Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Copyright of the dataset contents and models belongs to the original copyright holders.", |