File size: 5,789 Bytes

---
library_name: transformers
tags:
  - japanese
  - ner
  - medical
---

# Model Card for `sociocom/MedTXTNER`

**本モデルは、日本語医療テキストの NER（固有表現抽出）タスク向けに `cl-tohoku/bert-base-japanese-v3` をファインチューニングしたモデルです。**

## モデル詳細

### 説明
- ベースに `cl-tohoku/bert-base-japanese-v3`を使用  
- 奈良先端大で作成された日本語医療テキストのアノテーション付きデータ（症例報告、読影レポート、看護記録）でファインチューニングを実施

| 項目                    | 詳細                                   |
|-------------------------|----------------------------------------|
| **Developed by**        | NAIST ソーシャルコンピューティング研究室       |
| **Model type**          | Token classification       　　         |
| **Language(s)**         | Japanese                               |
| **Finetuned from**      | cl-tohoku/bert-base-japanese-v3        |

### モデルソース
- **Hub リポジトリ**: https://huggingface.co/sociocom/MedTXTNER

## タグおよび属性一覧
| タグ名   | 説明                                      | 属性一覧                                        |
|----------|-------------------------------------------|-------------------------------------------------|
| a        | 臓器・部位（Anatomical parts）            | なし                                            |
| c        | 変化（Change）            | なし                                            |
| cc       | クリニカルコンテクスト（Clinical Context）| executed, negated, other, scheduled             |
| d        | 病変・症状（Diseases and symptoms）       | general, negative, positive, suspicious         |
| f        | 特徴・尺度（Features and measurements）   | なし                                            |
| m-key    | 薬品名（Medicine name）                   | executed, negated, other, scheduled             |
| m-val    | 薬品値（Medicine value）                  | executed, negated, other, scheduled             |
| r        | 治療（Remedy）                            | executed, negated, other, scheduled             |
| t-key    | 検査項目（Test item）                     | executed, negated, other, scheduled             |
| t-test   | 検査名（Test name）                       | executed, negated, other, scheduled             |
| t-val    | 検査値（Test value）                      | なし                                            |
| timex3   | 時間表現（Time expressions）              | age, date, duration, med, misc, set, time   |

各タグ・属性の詳細は[Real-MedNLP アノテーションガイドライン](https://sociocom.naist.jp/real-mednlp/wp-content/uploads/sites/3/2021/12/Real-MedNLP_Annotation_Guidelines.pdf)をご参照ください。

## 利用方法

```python
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

model_dir = "sociocom/MedTXTNER"
model     = AutoModelForTokenClassification.from_pretrained(model_dir)
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def predict_text(text: str):
    enc = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding="longest",
        is_split_into_words=False
    ).to(device)

    with torch.no_grad():
        outputs = model(**enc)
        logits  = outputs.logits

    pred_ids = torch.argmax(logits, dim=-1)[0].cpu().tolist()
    tokens = tokenizer.convert_ids_to_tokens(enc["input_ids"][0])
    id2label = model.config.id2label

    result = []
    for tok, pid in zip(tokens, pred_ids):
        if tok in tokenizer.all_special_tokens:
            continue
        result.append((tok, id2label[pid]))
    return result

sample = "症例】５３歳女性。発熱と嘔気を認め、プレドニゾロンを中断しました。"
for tok, lab in predict_text(sample):
    print(f"{tok}\t{lab}")
```

## 出力例
```
症例    O
】      O
53      B-timex3_age
歳      I-timex3_age
女性    O
。      O
発熱    B-d_positive
と      I-d_positive
嘔      I-d_positive
##気    I-d_positive
を      O
認め    O
、      O
プレ    B-m-key_negated
##ド    I-m-key_negated
##ニ    I-m-key_negated
##ゾ    I-m-key_negated
##ロン  I-m-key_negated
を      O
中断    O
し      O
まし    O
た      O
。      O
```

## Evaluation

属性なし（エンティティタイプのみ評価）
| Dataset        | Micro‑F1 | Macro‑F1 | Weighted‑F1 |
| -------------- | --------:| --------:| -----------:|
| **Overall**    |     0.699 |     0.673 |       0.700 |
| **MedTxt‑CR**  |     0.608 |     0.575 |       0.612 |
| **MedTxt‑RR**  |     0.903 |     0.930 |       0.903 |
| **MedTxt‑NR**  |     0.800 |     0.788 |       0.800 |

属性あり（エンティティタイプ+属性を区別して評価）
| Dataset        | Micro‑F1 | Macro‑F1 | Weighted‑F1 |
| -------------- | --------:| --------:| -----------:|
| **Overall**    |     0.638 |     0.480 |       0.641 |
| **MedTxt‑CR**  |     0.551 |     0.396 |       0.559 |
| **MedTxt‑RR**  |     0.887 |     0.708 |       0.888 |
| **MedTxt‑NR**  |     0.730 |     0.552 |       0.731 |


## Publication

This model can be cites as:

```
@misc{social_computing_lab_2025,
	author       = { Social Computing Lab },
	title        = { MedTXTNER (Revision 6788187) },
	year         = 2025,
	url          = { https://huggingface.co/sociocom/MedTXTNER },
	doi          = { 10.57967/hf/5732 },
	publisher    = { Hugging Face }
}
```