File size: 2,376 Bytes
4e1738c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
language: ru
license: mit
tags:
  - text-classification
  - intent-classification
  - onnx
  - rubert
  - chatbot
  - rag
datasets:
  - custom
metrics:
  - f1
  - accuracy
pipeline_tag: text-classification
model-index:
  - name: intent-classifier-rubert-tiny2
    results:
      - task:
          type: text-classification
          name: Intent Classification
        metrics:
          - name: F1 (weighted)
            type: f1
            value: 0.90
          - name: Accuracy
            type: accuracy
            value: 0.90
---

# Intent Classifier (ruBERT-tiny2)

Fine-tuned **cointegrated/rubert-tiny2** for classifying Russian chatbot messages into 3 intents.

## Use Case

RAG (Retrieval-Augmented Generation) chatbots need to classify user messages before processing:
- **rag** - user wants to search documents / knowledge base
- **chat** - greeting, small talk, bot questions
- **followup** - clarification of previous answer

This model replaces LLM API calls (300-2000ms, ~$0.001/req) with local inference (3.7ms, $0).

## Results

| Class | Precision | Recall | F1 |
|-------|-----------|--------|-----|
| rag | 0.94 | 0.98 | 0.96 |
| chat | 0.87 | 0.90 | 0.88 |
| followup | 0.86 | 0.73 | 0.79 |
| **Overall** | | | **0.90** |

## Quick Start (ONNX)

```python
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Gleckus/intent-classifier-rubert-tiny2")
LABELS = ["rag", "chat", "followup"]

def classify(text):
    inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=128)
    outputs = session.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})
    probs = np.exp(outputs[0][0]) / np.exp(outputs[0][0]).sum()
    return LABELS[np.argmax(probs)], float(probs.max())

label, conf = classify("какие условия возврата?")
print(f"{label} ({conf:.1%})")  # rag (95.2%)
```

## Training

- **Base model:** cointegrated/rubert-tiny2 (29M params)
- **Dataset:** 2,877 synthetic examples (template-based + augmented)
- **Training:** 5 epochs, batch 32, lr 2e-5, Google Colab T4 GPU
- **Export:** ONNX format, ~111MB

## Links

- [GitHub Repository](https://github.com/GleckusZeroFive/intent-classifier) - full code, dataset, documentation