| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - stockmark/ner-wikipedia-dataset |
| | language: |
| | - ja |
| | - en |
| | metrics: |
| | - f1 |
| | - recall |
| | - precision |
| | - accuracy |
| | library_name: transformers |
| | pipeline_tag: token-classification |
| | tags: |
| | - ner |
| | - named entity recognition |
| | - stockmark ner |
| | - bert |
| | - japanese named entity recognition |
| | - japanese ner |
| | - transformers |
| | --- |
| | ### Model Description |
| |
|
| | This model is a fine-tuned version of the `tohoku-nlp/bert-base-japanese-v3`, specifically optimized for Named Entity Recognition (NER) tasks. |
| | It is fine-tuned using a Japanese named entity extraction dataset derived from Wikipedia, which was developed and made publicly available by Stockmark Inc. ([NER Wikipedia Dataset](https://github.com/stockmarkteam/ner-wikipedia-dataset)). |
| |
|
| | ### Intended Use |
| |
|
| | This model is intended for use in tasks that require the identification and categorization of named entities within Japanese text. |
| | It is suitable for various applications in natural language processing where understanding the specific names of people, organizations, locations, etc., is crucial. |
| |
|
| | ### How to Use |
| |
|
| | You can use this model for NER tasks with the following simple code snippet: |
| |
|
| | ```python |
| | from transformers import AutoModelForTokenClassification, AutoTokenizer |
| | import torch |
| | |
| | model_name = "knosing/japanese_ner_model" |
| | tokenizer = AutoTokenizer.from_pretrained("tohoku-nlp/bert-base-japanese-v3") |
| | model = AutoModelForTokenClassification.from_pretrained(model_name) |
| | ``` |
| |
|
| | ### Model Performance |
| |
|
| | The model has been evaluated on various entity types to assess its precision, recall, F1 score, and overall accuracy. Below is the detailed performance breakdown by entity type: |
| |
|
| | #### Overall Metrics |
| |
|
| | - **Overall Precision:** 0.8379 |
| | - **Overall Recall:** 0.8477 |
| | - **Overall F1 Score:** 0.8428 |
| | - **Overall Accuracy:** 0.9684 |
| |
|
| | #### Performance by Entity Type |
| |
|
| | - **Other Organization Names (`の他の組織名`):** |
| | - **Precision:** 0.71875 |
| | - **Recall:** 0.69 |
| | - **F1 Score:** 0.7041 |
| | - **Sample Count:** 100 |
| |
|
| | - **Event Names (`ベント名`):** |
| | - **Precision:** 0.85 |
| | - **Recall:** 0.8586 |
| | - **F1 Score:** 0.8543 |
| | - **Sample Count:** 99 |
| |
|
| | - **Personal Names (`人名`):** |
| | - **Precision:** 0.8171 |
| | - **Recall:** 0.8664 |
| | - **F1 Score:** 0.8410 |
| | - **Sample Count:** 232 |
| |
|
| | - **Generic Names (`名`):** |
| | - **Precision:** 0.8986 |
| | - **Recall:** 0.9376 |
| | - **F1 Score:** 0.9177 |
| | - **Sample Count:** 529 |
| |
|
| | - **Product Names (`品名`):** |
| | - **Precision:** 0.6522 |
| | - **Recall:** 0.5906 |
| | - **F1 Score:** 0.6198 |
| | - **Sample Count:** 127 |
| |
|
| | - **Government Organization Names (`治的組織名`):** |
| | - **Precision:** 0.9160 |
| | - **Recall:** 0.8276 |
| | - **F1 Score:** 0.8696 |
| | - **Sample Count:** 145 |
| |
|
| | - **Facility Names (`設名`):** |
| | - **Precision:** 0.7905 |
| | - **Recall:** 0.8357 |
| | - **F1 Score:** 0.8125 |
| | - **Sample Count:** 140 |
| |
|
| | ### Note |
| | You might not able to use the model with huggingface Inference API. |
| | The intended use for the model is given in the following repository: [KeshavSingh29/fa_ner_japanese](https://github.com/KeshavSingh29/fa_ner_japanese) |
| | If you have any questions, please feel free to contact me or raise an issue at the above repo. |