ICD-10 Chapter Classifier (RU)

Task

Single-label classification of ICD-10 chapters from Russian clinical text.

  • Classes / Groups 21 (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, Z; without U).
  • Labels used in this checkpoint: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, Z.

Training data

  • Main dataset: datasets/icd10_final_dataset_all.csv (sha256: f008e02dbe74623565e9eccb512f0cf1d8b5f24cb0882c4b432a4f8d8cf22709)
  • Upstream pipeline: ml/train_rubert_base.ipynb
  • Optional synthetic sources:
  • icd10_final_dataset_synthetic.csv (sha256: ff40b24c4e1a1b79d365a1a2d3ad54afa9e8e295851dff42e45ea22a38d4321a)
  • generated.csv (sha256: a35e6dc73c92ff0fdc02a54c600d4c9dfcdd5d18ce33397f4e6b9788df0e3bc0)

Splits

  • train: 9608
  • val: 1137
  • test: 1136

Metrics (test)

Metric Value
top-1 accuracy 0.7121
top-3 accuracy 0.8724
f1-macro 0.4029
f1-weighted 0.7043
precision-macro 0.4208
recall-macro 0.4123
precision-weighted 0.7110
recall-weighted 0.7121

Inference snippet

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo = "Dmitry43243242/icd10-ru-chapter"
tok = AutoTokenizer.from_pretrained(repo)
mdl = AutoModelForSequenceClassification.from_pretrained(repo).eval()
text = "Жалобы пациента..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=384)
with torch.no_grad():
    probs = torch.softmax(mdl(**enc).logits, dim=-1)[0]
top3 = sorted(
    [(mdl.config.id2label[i], float(p)) for i, p in enumerate(probs)],
    key=lambda x: -x[1],
)[:3]
print(top3)

Intended Use

Decision-support for ICD-10 chapter triage from text. This model does not replace clinical judgment.

Limitations

  • Russian language only.
  • Trained on de-identified/PII-redacted clinical text.
  • Predicts ICD-10 chapter only, not 3-digit diagnosis code.
  • For lower-level coding, use subgroup models like <HF_USER>/icd10-ru-subgroup-<letter>.
  • If max_prob < 0.30, route to manual review.

Citation

  • Base model: ai-forever/ruBert-base
  • Pipeline: ai-app training workflow (ml/train_rubert_base.ipynb)
Downloads last month
72
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dmitry43243242/icd10-ru-chapter

Finetuned
(43)
this model