yseop
/

SMM4H2024_Task2a_ja

Token Classification

Model card Files Files and versions

vahbuna commited on Aug 8, 2024

Commit

23e9948

·

verified ·

1 Parent(s): b444606

init: model card

Files changed (1) hide show

README.md +51 -3

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
----
-license: afl-3.0
----

+---
+license: afl-3.0
+language:
+- ja
+metrics:
+- seqeval
+library_name: transformers
+pipeline_tag: token-classification
+---
+# SMM4H-2024 Task 2 Japanese NER
+## Overview
+This is a named entity extraction model created by fine-tuning [daisaku-s/medtxt_ner_roberta](https://huggingface.co/daisaku-s/medtxt_ner_roberta) on [SMM4H 2024 Task 2a](https://healthlanguageprocessing.org/smm4h-2024/) corpus.
+Tag set (IOB2 format):
+* DRUG
+* DISORDER
+* FUNCTION
+## Usage
+```python
+from transformers import BertForTokenClassification, AutoTokenizer
+import torch
+text = "サンプルテキスト"
+model_name = "yseop/SMM4H2024_Task2a_ja"
+with torch.inference_mode():
+    model = BertForTokenClassification.from_pretrained(model_name).eval()
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    idx2tag = model.config.id2label
+    vecs = tokenizer(text,
+                     padding=True,
+                     truncation=True,
+                     return_tensors="pt")
+    ner_logits = model(input_ids=vecs["input_ids"],
+                       attention_mask=vecs["attention_mask"])
+    idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
+    token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
+    pred_tag = [idx2tag[x] for x in idx][1:-1]
+```
+## Results
+|NE	|tp	|fp	|fn	|precision|	recall|	f1|
+|---|---:|---:|---:|---:|---:|---:|
+|DISORDER|	588	|409|	330|	0.5898|	0.6405|	0.6141|
+|DRUG|	307	|143	|169|	0.6822|	0.645|	0.6631|
+|FUNCTION|	69	|160	|170|	0.3013|	0.2887|	0.2949|
+|all|	964|	712	|669	|0.5752	|0.5903	|0.5827|