knosing
/

japanese_ner_model

Token Classification

named entity recognition

japanese named entity recognition

Model card Files Files and versions

japanese_ner_model / README.md

knosing's picture

Add repository details

7682fb5 verified almost 2 years ago

|

history blame contribute delete

3.12 kB

	---
	license: apache-2.0
	datasets:
	- stockmark/ner-wikipedia-dataset
	language:
	- ja
	- en
	metrics:
	- f1
	- recall
	- precision
	- accuracy
	library_name: transformers
	pipeline_tag: token-classification
	tags:
	- ner
	- named entity recognition
	- stockmark ner
	- bert
	- japanese named entity recognition
	- japanese ner
	- transformers
	---
	### Model Description

	This model is a fine-tuned version of the `tohoku-nlp/bert-base-japanese-v3`, specifically optimized for Named Entity Recognition (NER) tasks.
	It is fine-tuned using a Japanese named entity extraction dataset derived from Wikipedia, which was developed and made publicly available by Stockmark Inc. ([NER Wikipedia Dataset](https://github.com/stockmarkteam/ner-wikipedia-dataset)).

	### Intended Use

	This model is intended for use in tasks that require the identification and categorization of named entities within Japanese text.
	It is suitable for various applications in natural language processing where understanding the specific names of people, organizations, locations, etc., is crucial.

	### How to Use

	You can use this model for NER tasks with the following simple code snippet:

	```python
	from transformers import AutoModelForTokenClassification, AutoTokenizer
	import torch

	model_name = "knosing/japanese_ner_model"
	tokenizer = AutoTokenizer.from_pretrained("tohoku-nlp/bert-base-japanese-v3")
	model = AutoModelForTokenClassification.from_pretrained(model_name)
	```

	### Model Performance

	The model has been evaluated on various entity types to assess its precision, recall, F1 score, and overall accuracy. Below is the detailed performance breakdown by entity type:

	#### Overall Metrics

	- Overall Precision: 0.8379
	- Overall Recall: 0.8477
	- Overall F1 Score: 0.8428
	- Overall Accuracy: 0.9684

	#### Performance by Entity Type

	- Other Organization Names (`の他の組織名`):
	- Precision: 0.71875
	- Recall: 0.69
	- F1 Score: 0.7041
	- Sample Count: 100

	- Event Names (`ベント名`):
	- Precision: 0.85
	- Recall: 0.8586
	- F1 Score: 0.8543
	- Sample Count: 99

	- Personal Names (`人名`):
	- Precision: 0.8171
	- Recall: 0.8664
	- F1 Score: 0.8410
	- Sample Count: 232

	- Generic Names (`名`):
	- Precision: 0.8986
	- Recall: 0.9376
	- F1 Score: 0.9177
	- Sample Count: 529

	- Product Names (`品名`):
	- Precision: 0.6522
	- Recall: 0.5906
	- F1 Score: 0.6198
	- Sample Count: 127

	- Government Organization Names (`治的組織名`):
	- Precision: 0.9160
	- Recall: 0.8276
	- F1 Score: 0.8696
	- Sample Count: 145

	- Facility Names (`設名`):
	- Precision: 0.7905
	- Recall: 0.8357
	- F1 Score: 0.8125
	- Sample Count: 140

	### Note
	You might not able to use the model with huggingface Inference API.
	The intended use for the model is given in the following repository: [KeshavSingh29/fa_ner_japanese](https://github.com/KeshavSingh29/fa_ner_japanese)
	If you have any questions, please feel free to contact me or raise an issue at the above repo.