|
|
--- |
|
|
base_model: |
|
|
- cointegrated/rubert-tiny2 |
|
|
datasets: |
|
|
- Mykes/patient_queries_ner_SDDCS |
|
|
language: |
|
|
- ru |
|
|
library_name: transformers |
|
|
tags: |
|
|
- biology |
|
|
- medical |
|
|
--- |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
# rubert_ner_SDDCS |
|
|
SDDCS - abbreviation for ner-entities SYMPTOMS, DISEASES, DRUGS, CITIES, SUBWAY STATIONS (additionall it is able to predict GENDER and AGE entities) |
|
|
This is a fine-tuned Named Entity Recognition (NER) model based on the [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model with only 29.4M params, designed to detect russian medical entities like diseases, drugs, symptoms, and more. |
|
|
|
|
|
# rubert_ner_SDDCS |
|
|
Модель med_ner_SDDCS для извлечения именнованных сущностей из запросов пациентов. Аббревиатура SDDCS указывает на список сущностей (S - симптомы, D - заболевания, D - препараты, C - город, S - станция метро. Также, модель выделяет GENDER - указание на пол и AGE - указание на возраст). |
|
|
Модель основана на компактной rubert-tiny2 модели с 29.4 миллиона параметров, что оптимально для запуска на сервере с небольшими требованиями к железу. |
|
|
|
|
|
# Model Details |
|
|
- Model Name: rubert_ner_SDDCS |
|
|
- Base Model: cointegrated/rubert-tiny2 |
|
|
- Fine-tuned on: [Mykes/patient_queries_ner_SDDCS](https://huggingface.co/datasets/Mykes/patient_queries_ner_SDDCS) |
|
|
|
|
|
## Entities Recognized: |
|
|
- GENDER (e.g., женщина, мужчина) 👩👨 |
|
|
- DISEASE (e.g., паническое расстройство, грипп, ...) 🤒 |
|
|
- SYMPTOM (e.g., тревога, одышка, ...) 🩺 |
|
|
- SPECIALITY (e.g., невролог, кардиолог, ...) 👩⚕️ |
|
|
- CITY (e.g., Тула, Москва, Иркутск, ...) 🏙️ |
|
|
- SUBWAY (e.g., Шоссе Энтузиастов, Проспект Мира, ...) 🚇 |
|
|
- DRUG (e.g., кардиомагнил, ципралекс) 💊 |
|
|
- AGE (e.g., ребенок, пожилой) 🧒🏼👴 |
|
|
|
|
|
## Model Performance |
|
|
The fine-tuned model has achieved the following performance metrics: |
|
|
|
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
AGE 1.00 1.00 1.00 583 |
|
|
CITY 1.00 1.00 1.00 5244 |
|
|
DISEASE 0.99 1.00 1.00 6569 |
|
|
DRUG 1.00 1.00 1.00 8220 |
|
|
GENDER 1.00 1.00 1.00 664 |
|
|
SPECIALITY 1.00 0.98 0.99 4207 |
|
|
SUBWAY 1.00 1.00 1.00 1084 |
|
|
SYMPTOM 1.00 1.00 1.00 8979 |
|
|
|
|
|
micro avg 1.00 1.00 1.00 35550 |
|
|
macro avg 1.00 1.00 1.00 35550 |
|
|
weighted avg 1.00 1.00 1.00 35550 |
|
|
|
|
|
``` |
|
|
## When to use |
|
|
You can use this model with the huggingface transformers 🤗 to perform Named Entity Recognition (NER) tasks in the russian medical domain, mainly for patient queries. |
|
|
|
|
|
Here's how to load and use the model: |
|
|
## Load the tokenizer and model with transformers |
|
|
``` |
|
|
from transformers import pipeline |
|
|
|
|
|
pipe = pipeline(task="ner", model='Mykes/rubert_ner_SDDCS', tokenizer='Mykes/rubert_ner_SDDCS', aggregation_strategy="max") |
|
|
# I made the misspelled words on purpose |
|
|
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи психиатра в районе метро Октбрьской." |
|
|
pipe(query.lower()) |
|
|
``` |
|
|
Result: |
|
|
``` |
|
|
[{'entity_group': 'AGE', |
|
|
'score': 0.99993, |
|
|
'word': 'ребенка', |
|
|
'start': 2, |
|
|
'end': 9}, |
|
|
{'entity_group': 'SYMPTOM', |
|
|
'score': 0.9885457, |
|
|
'word': 'треога', |
|
|
'start': 10, |
|
|
'end': 16}, |
|
|
{'entity_group': 'SYMPTOM', |
|
|
'score': 0.9934536, |
|
|
'word': 'норушения сна', |
|
|
'start': 19, |
|
|
'end': 32}, |
|
|
{'entity_group': 'SYMPTOM', |
|
|
'score': 0.9999765, |
|
|
'word': 'потеря сознания', |
|
|
'start': 34, |
|
|
'end': 49}, |
|
|
{'entity_group': 'DISEASE', |
|
|
'score': 0.999972, |
|
|
'word': 'паническое расстройство', |
|
|
'start': 66, |
|
|
'end': 89}, |
|
|
{'entity_group': 'SPECIALITY', |
|
|
'score': 0.85958296, |
|
|
'word': 'психиатра', |
|
|
'start': 100, |
|
|
'end': 109}, |
|
|
{'entity_group': 'SUBWAY', |
|
|
'score': 0.9955049, |
|
|
'word': 'октбрьской', |
|
|
'start': 125, |
|
|
'end': 135}] |
|
|
``` |
|
|
|
|
|
## How to render |
|
|
``` |
|
|
import spacy |
|
|
from spacy import displacy |
|
|
|
|
|
def convert_to_displacy_format(text, ner_results): |
|
|
entities = [] |
|
|
for result in ner_results: |
|
|
# Convert the Hugging Face output into the format displacy expects |
|
|
entities.append({ |
|
|
"start": result['start'], |
|
|
"end": result['end'], |
|
|
"label": result['entity_group'] |
|
|
}) |
|
|
return { |
|
|
"text": text, |
|
|
"ents": entities, |
|
|
"title": None |
|
|
} |
|
|
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство, принимал атаракс. Подскажи хорошего психиатра в районе метро Октбрьской." |
|
|
ner_results = pipe(query.lower()) |
|
|
displacy_data = convert_to_displacy_format(query, ner_results) |
|
|
colors = { |
|
|
"SPECIALITY": "linear-gradient(90deg, #aa9cfc, #fc9ce7)", |
|
|
"CITY": "linear-gradient(90deg, #feca57, #ff9f43)", |
|
|
"DRUG": "linear-gradient(90deg, #55efc4, #81ecec)", |
|
|
"DISEASE": "linear-gradient(90deg, #fab1a0, #ff7675)", |
|
|
"SUBWAY": "linear-gradient(90deg, #00add0, #0039a6)", |
|
|
"AGE": "linear-gradient(90deg, #f39c12, #e67e22)", |
|
|
"SYMPTOM": "linear-gradient(90deg, #e74c3c, #c0392b)" |
|
|
} |
|
|
options = {"ents": ["SPECIALITY", "CITY", "DRUG", "DISEASE", "SYMPTOM", "AGE", "SUBWAY"], "colors": colors} |
|
|
html = displacy.render(displacy_data, style="ent", manual=True, options=options, jupyter=False) |
|
|
with open("ner_visualization_with_colors.html", "w", encoding="utf-8") as f: |
|
|
f.write(html) |
|
|
from IPython.display import display, HTML |
|
|
display(HTML(html)) |
|
|
``` |