MedTXTNER / README.md

Update README.md

78c3513 verified 7 months ago

5.79 kB

	---
	library_name: transformers
	tags:
	- japanese
	- ner
	- medical
	---

	# Model Card for `sociocom/MedTXTNER`

	本モデルは、日本語医療テキストの NER（固有表現抽出）タスク向けに `cl-tohoku/bert-base-japanese-v3` をファインチューニングしたモデルです。

	## モデル詳細

	### 説明
	- ベースに `cl-tohoku/bert-base-japanese-v3`を使用
	- 奈良先端大で作成された日本語医療テキストのアノテーション付きデータ（症例報告、読影レポート、看護記録）でファインチューニングを実施

	\| 項目 \| 詳細 \|
	\|-------------------------\|----------------------------------------\|
	\| Developed by \| NAIST ソーシャルコンピューティング研究室 \|
	\| Model type \| Token classification 　　 \|
	\| Language(s) \| Japanese \|
	\| Finetuned from \| cl-tohoku/bert-base-japanese-v3 \|

	### モデルソース
	- Hub リポジトリ: https://huggingface.co/sociocom/MedTXTNER

	## タグおよび属性一覧
	\| タグ名 \| 説明 \| 属性一覧 \|
	\|----------\|-------------------------------------------\|-------------------------------------------------\|
	\| a \| 臓器・部位（Anatomical parts） \| なし \|
	\| c \| 変化（Change） \| なし \|
	\| cc \| クリニカルコンテクスト（Clinical Context）\| executed, negated, other, scheduled \|
	\| d \| 病変・症状（Diseases and symptoms） \| general, negative, positive, suspicious \|
	\| f \| 特徴・尺度（Features and measurements） \| なし \|
	\| m-key \| 薬品名（Medicine name） \| executed, negated, other, scheduled \|
	\| m-val \| 薬品値（Medicine value） \| executed, negated, other, scheduled \|
	\| r \| 治療（Remedy） \| executed, negated, other, scheduled \|
	\| t-key \| 検査項目（Test item） \| executed, negated, other, scheduled \|
	\| t-test \| 検査名（Test name） \| executed, negated, other, scheduled \|
	\| t-val \| 検査値（Test value） \| なし \|
	\| timex3 \| 時間表現（Time expressions） \| age, date, duration, med, misc, set, time \|

	各タグ・属性の詳細は[Real-MedNLP アノテーションガイドライン](https://sociocom.naist.jp/real-mednlp/wp-content/uploads/sites/3/2021/12/Real-MedNLP_Annotation_Guidelines.pdf)をご参照ください。

	## 利用方法

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForTokenClassification

	model_dir = "sociocom/MedTXTNER"
	model = AutoModelForTokenClassification.from_pretrained(model_dir)
	tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	def predict_text(text: str):
	enc = tokenizer(
	text,
	return_tensors="pt",
	truncation=True,
	padding="longest",
	is_split_into_words=False
	).to(device)

	with torch.no_grad():
	outputs = model(**enc)
	logits = outputs.logits

	pred_ids = torch.argmax(logits, dim=-1)[0].cpu().tolist()
	tokens = tokenizer.convert_ids_to_tokens(enc["input_ids"][0])
	id2label = model.config.id2label

	result = []
	for tok, pid in zip(tokens, pred_ids):
	if tok in tokenizer.all_special_tokens:
	continue
	result.append((tok, id2label[pid]))
	return result

	sample = "症例】５３歳女性。発熱と嘔気を認め、プレドニゾロンを中断しました。"
	for tok, lab in predict_text(sample):
	print(f"{tok}\t{lab}")
	```

	## 出力例
	```
	症例 O
	】 O
	53 B-timex3_age
	歳 I-timex3_age
	女性 O
	。 O
	発熱 B-d_positive
	と I-d_positive
	嘔 I-d_positive
	##気 I-d_positive
	を O
	認め O
	、 O
	プレ B-m-key_negated
	##ド I-m-key_negated
	##ニ I-m-key_negated
	##ゾ I-m-key_negated
	##ロン I-m-key_negated
	を O
	中断 O
	し O
	まし O
	た O
	。 O
	```

	## Evaluation

	属性なし（エンティティタイプのみ評価）
	\| Dataset \| Micro‑F1 \| Macro‑F1 \| Weighted‑F1 \|
	\| -------------- \| --------:\| --------:\| -----------:\|
	\| Overall \| 0.699 \| 0.673 \| 0.700 \|
	\| MedTxt‑CR \| 0.608 \| 0.575 \| 0.612 \|
	\| MedTxt‑RR \| 0.903 \| 0.930 \| 0.903 \|
	\| MedTxt‑NR \| 0.800 \| 0.788 \| 0.800 \|

	属性あり（エンティティタイプ+属性を区別して評価）
	\| Dataset \| Micro‑F1 \| Macro‑F1 \| Weighted‑F1 \|
	\| -------------- \| --------:\| --------:\| -----------:\|
	\| Overall \| 0.638 \| 0.480 \| 0.641 \|
	\| MedTxt‑CR \| 0.551 \| 0.396 \| 0.559 \|
	\| MedTxt‑RR \| 0.887 \| 0.708 \| 0.888 \|
	\| MedTxt‑NR \| 0.730 \| 0.552 \| 0.731 \|


	## Publication

	This model can be cites as:

	```
	@misc{social_computing_lab_2025,
	author = { Social Computing Lab },
	title = { MedTXTNER (Revision 6788187) },
	year = 2025,
	url = { https://huggingface.co/sociocom/MedTXTNER },
	doi = { 10.57967/hf/5732 },
	publisher = { Hugging Face }
	}
	```