Text Classification
Safetensors
GLiClass
File size: 7,490 Bytes
0651dda
1e07c00
 
 
50b4207
0651dda
1e07c00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68ef56f
 
 
 
 
 
 
 
1e07c00
68ef56f
 
8f5754c
 
 
 
 
 
 
 
 
 
 
 
 
ab5114e
4599a43
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: apache-2.0
datasets:
- knowledgator/gliclass-v2.0
pipeline_tag: text-classification
---
# ⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

This is an efficient zero-shot classifier inspired by [GLiNER](https://github.com/urchade/GLiNER/tree/main) work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.

It can be used for `topic classification`, `sentiment analysis` and as a reranker in `RAG` pipelines.

The model was trained on synthetic and licensed data that allow commercial use and can be used in commercial applications.

The backbone model is [mdeberta-v3-base](huggingface.co/microsoft/mdeberta-v3-base). It supports multilingual understanding, making it well-suited for tasks involving texts in different languages.

### How to use:
First of all, you need to install GLiClass library:
```bash
pip install gliclass
pip install -U transformers>=4.48.0
```

Than you need to initialize a model and a pipeline:

<details>
<summary>English</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
for result in results:
 print(result["label"], "=>", result["score"])
```
</details>
<details>
<summary>Spanish</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "¡Un día veré el mundo!"
labels = ["viajes", "sueños", "deportes", "ciencia", "política"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```
</details>
<details>
<summary>Italitan</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Un giorno vedrò il mondo!"
labels = ["viaggi", "sogni", "sport", "scienza", "politica"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```

</details>
<details>
<summary>French</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Un jour, je verrai le monde!"
labels = ["voyage", "rêves", "sport", "science", "politique"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```

</details>
<details>
<summary>German</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Eines Tages werde ich die Welt sehen!"
labels = ["Reisen", "Träume", "Sport", "Wissenschaft", "Politik"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```

</details>

### Benchmarks:
Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.
#### Multilingual benchmarks
| Dataset                  | gliclass-x-base | gliclass-base-v3.0 | gliclass-large-v3.0 |
| ------------------------ | --------------- | ------------------ | ------------------- |
| FredZhang7/toxi-text-3M  | 0.5972          | 0.5072             | 0.6118              |
| SetFit/xglue\_nc         | 0.5014          | 0.5348             | 0.5378              |
| Davlan/sib200\_14classes | 0.4663          | 0.2867             | 0.3173              |
| uhhlt/GermEval2017       | 0.3999          | 0.4010             | 0.4299              |
| dolfsai/toxic\_es        | 0.1250          | 0.1399             | 0.1412              |
| **Average**              | **0.41796**     | **0.37392**        | **0.4076**          |
#### General benchmarks
| Dataset                      | gliclass-x-base | gliclass-base-v3.0 | gliclass-large-v3.0 |
| ---------------------------- | --------------- | ------------------ | ------------------- |
| SetFit/CR                    | 0.8630          | 0.9127             | 0.9398              |
| SetFit/sst2                  | 0.8554          | 0.8959             | 0.9192              |
| SetFit/sst5                  | 0.3287          | 0.3376             | 0.4606              |
| AmazonScience/massive        | 0.2611          | 0.5040             | 0.5649              |
| stanfordnlp/imdb             | 0.8840          | 0.9251             | 0.9366              |
| SetFit/20\_newsgroups        | 0.4116          | 0.4759             | 0.5958              |
| SetFit/enron\_spam           | 0.5929          | 0.6760             | 0.7584              |
| PolyAI/banking77             | 0.3098          | 0.4698             | 0.5574              |
| takala/financial\_phrasebank | 0.7851          | 0.8971             | 0.9000              |
| ag\_news                     | 0.6815          | 0.7279             | 0.7181              |
| dair-ai/emotion              | 0.3667          | 0.4447             | 0.4506              |
| MoritzLaurer/cap\_sotu       | 0.3935          | 0.4614             | 0.4589              |
| cornell/rotten\_tomatoes     | 0.7252          | 0.7943             | 0.8411              |
| snips                        | 0.6307          | 0.9474             | 0.9692              |
| **Average**                  | **0.5778**      | **0.6764**         | **0.7193**          |

## Citation
```bibtex
@misc{stepanov2025gliclassgeneralistlightweightmodel,
      title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
      year={2025},
      eprint={2508.07662},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.07662}, 
}
```