File size: 7,490 Bytes

---
license: apache-2.0
datasets:
- knowledgator/gliclass-v2.0
pipeline_tag: text-classification
---
# ⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

This is an efficient zero-shot classifier inspired by [GLiNER](https://github.com/urchade/GLiNER/tree/main) work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.

It can be used for `topic classification`, `sentiment analysis` and as a reranker in `RAG` pipelines.

The model was trained on synthetic and licensed data that allow commercial use and can be used in commercial applications.

The backbone model is [mdeberta-v3-base](huggingface.co/microsoft/mdeberta-v3-base). It supports multilingual understanding, making it well-suited for tasks involving texts in different languages.

### How to use:
First of all, you need to install GLiClass library:
```bash
pip install gliclass
pip install -U transformers>=4.48.0
```

Than you need to initialize a model and a pipeline:

<details>
<summary>English</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
for result in results:
 print(result["label"], "=>", result["score"])
```
</details>
<details>
<summary>Spanish</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "¡Un día veré el mundo!"
labels = ["viajes", "sueños", "deportes", "ciencia", "política"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```
</details>
<details>
<summary>Italitan</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Un giorno vedrò il mondo!"
labels = ["viaggi", "sogni", "sport", "scienza", "politica"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```

</details>
<details>
<summary>French</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Un jour, je verrai le monde!"
labels = ["voyage", "rêves", "sport", "science", "politique"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```

</details>
<details>
<summary>German</summary>

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Eines Tages werde ich die Welt sehen!"
labels = ["Reisen", "Träume", "Sport", "Wissenschaft", "Politik"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
```

</details>

### Benchmarks:
Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.
#### Multilingual benchmarks
| Dataset                  | gliclass-x-base | gliclass-base-v3.0 | gliclass-large-v3.0 |
| ------------------------ | --------------- | ------------------ | ------------------- |
| FredZhang7/toxi-text-3M  | 0.5972          | 0.5072             | 0.6118              |
| SetFit/xglue\_nc         | 0.5014          | 0.5348             | 0.5378              |
| Davlan/sib200\_14classes | 0.4663          | 0.2867             | 0.3173              |
| uhhlt/GermEval2017       | 0.3999          | 0.4010             | 0.4299              |
| dolfsai/toxic\_es        | 0.1250          | 0.1399             | 0.1412              |
| **Average**              | **0.41796**     | **0.37392**        | **0.4076**          |
#### General benchmarks
| Dataset                      | gliclass-x-base | gliclass-base-v3.0 | gliclass-large-v3.0 |
| ---------------------------- | --------------- | ------------------ | ------------------- |
| SetFit/CR                    | 0.8630          | 0.9127             | 0.9398              |
| SetFit/sst2                  | 0.8554          | 0.8959             | 0.9192              |
| SetFit/sst5                  | 0.3287          | 0.3376             | 0.4606              |
| AmazonScience/massive        | 0.2611          | 0.5040             | 0.5649              |
| stanfordnlp/imdb             | 0.8840          | 0.9251             | 0.9366              |
| SetFit/20\_newsgroups        | 0.4116          | 0.4759             | 0.5958              |
| SetFit/enron\_spam           | 0.5929          | 0.6760             | 0.7584              |
| PolyAI/banking77             | 0.3098          | 0.4698             | 0.5574              |
| takala/financial\_phrasebank | 0.7851          | 0.8971             | 0.9000              |
| ag\_news                     | 0.6815          | 0.7279             | 0.7181              |
| dair-ai/emotion              | 0.3667          | 0.4447             | 0.4506              |
| MoritzLaurer/cap\_sotu       | 0.3935          | 0.4614             | 0.4589              |
| cornell/rotten\_tomatoes     | 0.7252          | 0.7943             | 0.8411              |
| snips                        | 0.6307          | 0.9474             | 0.9692              |
| **Average**                  | **0.5778**      | **0.6764**         | **0.7193**          |

## Citation
```bibtex
@misc{stepanov2025gliclassgeneralistlightweightmodel,
      title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
      year={2025},
      eprint={2508.07662},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.07662}, 
}
```