Gleckus's picture
Add model card
4e1738c verified
metadata
language: ru
license: mit
tags:
  - text-classification
  - intent-classification
  - onnx
  - rubert
  - chatbot
  - rag
datasets:
  - custom
metrics:
  - f1
  - accuracy
pipeline_tag: text-classification
model-index:
  - name: intent-classifier-rubert-tiny2
    results:
      - task:
          type: text-classification
          name: Intent Classification
        metrics:
          - name: F1 (weighted)
            type: f1
            value: 0.9
          - name: Accuracy
            type: accuracy
            value: 0.9

Intent Classifier (ruBERT-tiny2)

Fine-tuned cointegrated/rubert-tiny2 for classifying Russian chatbot messages into 3 intents.

Use Case

RAG (Retrieval-Augmented Generation) chatbots need to classify user messages before processing:

  • rag - user wants to search documents / knowledge base
  • chat - greeting, small talk, bot questions
  • followup - clarification of previous answer

This model replaces LLM API calls (300-2000ms, ~$0.001/req) with local inference (3.7ms, $0).

Results

Class Precision Recall F1
rag 0.94 0.98 0.96
chat 0.87 0.90 0.88
followup 0.86 0.73 0.79
Overall 0.90

Quick Start (ONNX)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Gleckus/intent-classifier-rubert-tiny2")
LABELS = ["rag", "chat", "followup"]

def classify(text):
    inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=128)
    outputs = session.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})
    probs = np.exp(outputs[0][0]) / np.exp(outputs[0][0]).sum()
    return LABELS[np.argmax(probs)], float(probs.max())

label, conf = classify("какие условия возврата?")
print(f"{label} ({conf:.1%})")  # rag (95.2%)

Training

  • Base model: cointegrated/rubert-tiny2 (29M params)
  • Dataset: 2,877 synthetic examples (template-based + augmented)
  • Training: 5 epochs, batch 32, lr 2e-5, Google Colab T4 GPU
  • Export: ONNX format, ~111MB

Links