File size: 8,947 Bytes

---
license: apache-2.0
datasets:
- BioMike/formal-logic-reasoning-gliclass-2k
- knowledgator/gliclass-v3-logic-dataset
- tau/commonsense_qa
metrics:
- f1
tags:
- text classification
- nli
- sentiment analysis
pipeline_tag: text-classification
---

![image/png](instruct.png)

# GLiClass-multitask: Efficient zero-shot and few-shot multi-task model via sequence classification

GLiClass is an efficient zero-shot sequence classification model designed to achieve SoTA performance while being much faster than cross-encoders and LLMs, while preserving strong generalization capabilities.

The model supports text classification with any labels and can be used for the following tasks:
* Topic Classification
* Sentiment Analysis
* Intent Classification
* Reranking
* Hallucination Detection
* Rule-following Verification
* LLM-safety Classification
* Natural Language Inference

## ✨ What's New in V3

- **Hierarchical Labels** — Organize labels into groups using dot notation or dictionaries (e.g., `sentiment.positive`, `topic.product`).
- **Few-Shot Examples** — Provide in-context examples to boost accuracy on your specific task.
- **Label Descriptions** — Add natural-language descriptions to labels for more precise classification.
- **Task Prompts** — Prepend a custom prompt to guide the model's classification behavior.

See the [GLiClass library README](https://github.com/Knowledgator/GLiClass) for full details on these features.

## Installation

```bash
pip install gliclass
```

## Quick Start

```python
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-instruct-base-v1.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-instruct-base-v1.0")
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
```

---

## Task Examples

### 1. Topic Classification

```python
text = "NASA launched a new Mars rover to search for signs of ancient life."
labels = ["space", "politics", "sports", "technology", "health"]

results = pipeline(text, labels, threshold=0.5)[0]
for r in results:
    print(r["label"], "=>", r["score"])
```

#### With hierarchical labels

```python
hierarchical_labels = {
    "science": ["space", "biology", "physics"],
    "society": ["politics", "economics", "culture"]
}

results = pipeline(text, hierarchical_labels, threshold=0.5)[0]
for r in results:
    print(r["label"], "=>", r["score"])
# e.g. science.space => 0.95
```

### 2. Sentiment Analysis

```python
text = "The food was excellent but the service was painfully slow."
labels = ["positive", "negative", "neutral"]

results = pipeline(text, labels, threshold=0.5)[0]
for r in results:
    print(r["label"], "=>", r["score"])
```

#### With a task prompt

```python
results = pipeline(
    text, labels,
    prompt="Classify the sentiment of this restaurant review:",
    threshold=0.5
)[0]
```

### 3. Intent Classification

```python
text = "Can you set an alarm for 7am tomorrow?"
labels = ["set_alarm", "play_music", "get_weather", "send_message", "set_reminder"]

results = pipeline(text, labels, threshold=0.5)[0]
for r in results:
    print(r["label"], "=>", r["score"])
```

#### With few-shot examples

```python
examples = [
    {"text": "Wake me up at 6:30.", "labels": ["set_alarm"]},
    {"text": "Play some jazz.", "labels": ["play_music"]},
]

results = pipeline(text, labels, examples=examples, threshold=0.5)[0]
for r in results:
    print(r["label"], "=>", r["score"])
```

### 4. Natural Language Inference

Represent your premise as the text and the hypothesis as a label. The model works best with a single hypothesis at a time.

```python
text = "The cat slept on the windowsill all afternoon."
labels = ["The cat was awake and playing outside."]

results = pipeline(text, labels, threshold=0.0)[0]
print(results)
# Low score → contradiction
```

### 5. Reranking

Score query–passage relevance by treating passages as texts and the query as the label:

```python
query = "How to train a neural network?"
passages = [
    "Backpropagation is the key algorithm for training deep neural networks.",
    "The stock market rallied on strong earnings reports.",
    "Gradient descent optimizes model weights during training.",
]

for passage in passages:
    score = pipeline(passage, [query], threshold=0.0)[0][0]["score"]
    print(f"{score:.3f}  {passage[:60]}")
```

### 6. Hallucination Detection

Concatenate context, question, and answer into the text field:

```python
text = (
    "Context: The Eiffel Tower was built from 1887 to 1889 and is 330 m tall. "
    "It was the tallest structure until the Chrysler Building in 1930.\n"
    "Question: When was the Eiffel Tower built and how tall is it?\n"
    "Answer: It was built 1887–1889, stands 330 m tall, and was the tallest "
    "structure until the Empire State Building in 1931."
)
labels = ["hallucinated", "correct"]

results = pipeline(text, labels, threshold=0.0)[0]
for r in results:
    print(r["label"], "=>", r["score"])
# "hallucinated" should score higher (Empire State Building & 1931 are wrong)
```

### 7. Rule-following Verification

Include the domain and rules as part of the text:

```python
text = (
    "Domain: e-commerce product reviews\n"
    "Rule: No promotion of illegal activity.\n"
    "Text: The software is okay, but search for 'productname_patch_v2.zip' "
    "to unlock all features for free."
)
labels = ["follows_guidelines", "violates_guidelines"]

results = pipeline(text, labels, threshold=0.0)[0]
for r in results:
    print(r["label"], "=>", r["score"])
```

### 8. LLM-safety Classification

```python
text = "I'm looking for a good Italian restaurant near downtown Chicago, budget ~$50/person."
labels = [
    "benign request",
    "prompt injection",
    "system prompt extraction",
    "jailbreak attempt",
    "harmful content request",
    "social engineering",
    "data exfiltration",
]

results = pipeline(text, labels, threshold=0.5)[0]
for r in results:
    print(r["label"], "=>", r["score"])
```

---

## Benchmarks

F1 scores on zero-shot text classification (no fine-tuning on these datasets):

GLiClass-V1 Multitask:

| Dataset | [large‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-large-v1.0) | [base‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-base-v1.0) | [edge‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-edge-v1.0) |
|---|---|---|---|
| CR | 0.9066 | 0.8922 | 0.7933 |
| sst2 | 0.9154 | 0.9198 | 0.7577 |
| sst5 | 0.3387 | 0.2266 | 0.2163 |
| 20_newsgroups | 0.5577 | 0.5189 | 0.2555 |
| spam | 0.9790 | 0.9380 | 0.7609 |
| financial_phrasebank | 0.8289 | 0.5217 | 0.3905 |
| imdb | 0.9397 | 0.9364 | 0.8159 |
| ag_news | 0.7521 | 0.6978 | 0.6043 |
| emotion | 0.4473 | 0.4454 | 0.2941 |
| cap_sotu | 0.4327 | 0.4579 | 0.2380 |
| rotten_tomatoes | 0.8491 | 0.8458 | 0.5455 |
| massive | 0.5824 | 0.4757 | 0.2090 |
| banking | 0.6987 | 0.6072 | 0.4635 |
| snips | 0.8509 | 0.6515 | 0.5461 |
| **AVERAGE** | **0.7199** | **0.6525** | **0.4922** |

GLiClass-V3:

| Dataset | [large‑v3.0](https://huggingface.co/knowledgator/gliclass-large-v3.0) | [base‑v3.0](https://huggingface.co/knowledgator/gliclass-base-v3.0) | [modern‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-large-v3.0) | [modern‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-base-v3.0) | [edge‑v3.0](https://huggingface.co/knowledgator/gliclass-edge-v3.0) |
|---|---|---|---|---|---|
| CR | 0.9398 | 0.9127 | 0.8952 | 0.8902 | 0.8215 |
| sst2 | 0.9192 | 0.8959 | 0.9330 | 0.8959 | 0.8199 |
| sst5 | 0.4606 | 0.3376 | 0.4619 | 0.2756 | 0.2823 |
| 20_newsgroups | 0.5958 | 0.4759 | 0.3905 | 0.3433 | 0.2217 |
| spam | 0.7584 | 0.6760 | 0.5813 | 0.6398 | 0.5623 |
| financial_phrasebank | 0.9000 | 0.8971 | 0.5929 | 0.4200 | 0.5004 |
| imdb | 0.9366 | 0.9251 | 0.9402 | 0.9158 | 0.8485 |
| ag_news | 0.7181 | 0.7279 | 0.7269 | 0.6663 | 0.6645 |
| emotion | 0.4506 | 0.4447 | 0.4517 | 0.4254 | 0.3851 |
| cap_sotu | 0.4589 | 0.4614 | 0.4072 | 0.3625 | 0.2583 |
| rotten_tomatoes | 0.8411 | 0.7943 | 0.7664 | 0.7070 | 0.7024 |
| massive | 0.5649 | 0.5040 | 0.3905 | 0.3442 | 0.2414 |
| banking | 0.5574 | 0.4698 | 0.3683 | 0.3561 | 0.0272 |
| snips | 0.9692 | 0.9474 | 0.7707 | 0.5663 | 0.5257 |
| **AVERAGE** | **0.7193** | **0.6764** | **0.6197** | **0.5577** | **0.4900** |

## Citation

```bibtex
@misc{stepanov2025gliclassgeneralistlightweightmodel,
      title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
      year={2025},
      eprint={2508.07662},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.07662}, 
}
```