Update README.md

a825197 verified 9 days ago

8.95 kB

	---
	license: apache-2.0
	datasets:
	- BioMike/formal-logic-reasoning-gliclass-2k
	- knowledgator/gliclass-v3-logic-dataset
	- tau/commonsense_qa
	metrics:
	- f1
	tags:
	- text classification
	- nli
	- sentiment analysis
	pipeline_tag: text-classification
	---

	![image/png](instruct.png)

	# GLiClass-multitask: Efficient zero-shot and few-shot multi-task model via sequence classification

	GLiClass is an efficient zero-shot sequence classification model designed to achieve SoTA performance while being much faster than cross-encoders and LLMs, while preserving strong generalization capabilities.

	The model supports text classification with any labels and can be used for the following tasks:
	* Topic Classification
	* Sentiment Analysis
	* Intent Classification
	* Reranking
	* Hallucination Detection
	* Rule-following Verification
	* LLM-safety Classification
	* Natural Language Inference

	## ✨ What's New in V3

	- Hierarchical Labels — Organize labels into groups using dot notation or dictionaries (e.g., `sentiment.positive`, `topic.product`).
	- Few-Shot Examples — Provide in-context examples to boost accuracy on your specific task.
	- Label Descriptions — Add natural-language descriptions to labels for more precise classification.
	- Task Prompts — Prepend a custom prompt to guide the model's classification behavior.

	See the [GLiClass library README](https://github.com/Knowledgator/GLiClass) for full details on these features.

	## Installation

	```bash
	pip install gliclass
	```

	## Quick Start

	```python
	from gliclass import GLiClassModel, ZeroShotClassificationPipeline
	from transformers import AutoTokenizer

	model = GLiClassModel.from_pretrained("knowledgator/gliclass-instruct-base-v1.0")
	tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-instruct-base-v1.0")
	pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
	```

	---

	## Task Examples

	### 1. Topic Classification

	```python
	text = "NASA launched a new Mars rover to search for signs of ancient life."
	labels = ["space", "politics", "sports", "technology", "health"]

	results = pipeline(text, labels, threshold=0.5)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	```

	#### With hierarchical labels

	```python
	hierarchical_labels = {
	"science": ["space", "biology", "physics"],
	"society": ["politics", "economics", "culture"]
	}

	results = pipeline(text, hierarchical_labels, threshold=0.5)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	# e.g. science.space => 0.95
	```

	### 2. Sentiment Analysis

	```python
	text = "The food was excellent but the service was painfully slow."
	labels = ["positive", "negative", "neutral"]

	results = pipeline(text, labels, threshold=0.5)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	```

	#### With a task prompt

	```python
	results = pipeline(
	text, labels,
	prompt="Classify the sentiment of this restaurant review:",
	threshold=0.5
	)[0]
	```

	### 3. Intent Classification

	```python
	text = "Can you set an alarm for 7am tomorrow?"
	labels = ["set_alarm", "play_music", "get_weather", "send_message", "set_reminder"]

	results = pipeline(text, labels, threshold=0.5)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	```

	#### With few-shot examples

	```python
	examples = [
	{"text": "Wake me up at 6:30.", "labels": ["set_alarm"]},
	{"text": "Play some jazz.", "labels": ["play_music"]},
	]

	results = pipeline(text, labels, examples=examples, threshold=0.5)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	```

	### 4. Natural Language Inference

	Represent your premise as the text and the hypothesis as a label. The model works best with a single hypothesis at a time.

	```python
	text = "The cat slept on the windowsill all afternoon."
	labels = ["The cat was awake and playing outside."]

	results = pipeline(text, labels, threshold=0.0)[0]
	print(results)
	# Low score → contradiction
	```

	### 5. Reranking

	Score query–passage relevance by treating passages as texts and the query as the label:

	```python
	query = "How to train a neural network?"
	passages = [
	"Backpropagation is the key algorithm for training deep neural networks.",
	"The stock market rallied on strong earnings reports.",
	"Gradient descent optimizes model weights during training.",
	]

	for passage in passages:
	score = pipeline(passage, [query], threshold=0.0)[0][0]["score"]
	print(f"{score:.3f} {passage[:60]}")
	```

	### 6. Hallucination Detection

	Concatenate context, question, and answer into the text field:

	```python
	text = (
	"Context: The Eiffel Tower was built from 1887 to 1889 and is 330 m tall. "
	"It was the tallest structure until the Chrysler Building in 1930.\n"
	"Question: When was the Eiffel Tower built and how tall is it?\n"
	"Answer: It was built 1887–1889, stands 330 m tall, and was the tallest "
	"structure until the Empire State Building in 1931."
	)
	labels = ["hallucinated", "correct"]

	results = pipeline(text, labels, threshold=0.0)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	# "hallucinated" should score higher (Empire State Building & 1931 are wrong)
	```

	### 7. Rule-following Verification

	Include the domain and rules as part of the text:

	```python
	text = (
	"Domain: e-commerce product reviews\n"
	"Rule: No promotion of illegal activity.\n"
	"Text: The software is okay, but search for 'productname_patch_v2.zip' "
	"to unlock all features for free."
	)
	labels = ["follows_guidelines", "violates_guidelines"]

	results = pipeline(text, labels, threshold=0.0)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	```

	### 8. LLM-safety Classification

	```python
	text = "I'm looking for a good Italian restaurant near downtown Chicago, budget ~$50/person."
	labels = [
	"benign request",
	"prompt injection",
	"system prompt extraction",
	"jailbreak attempt",
	"harmful content request",
	"social engineering",
	"data exfiltration",
	]

	results = pipeline(text, labels, threshold=0.5)[0]
	for r in results:
	print(r["label"], "=>", r["score"])
	```

	---

	## Benchmarks

	F1 scores on zero-shot text classification (no fine-tuning on these datasets):

	GLiClass-V1 Multitask:

	\| Dataset \| [large‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-large-v1.0) \| [base‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-base-v1.0) \| [edge‑v1.0](https://huggingface.co/knowledgator/gliclass-instruct-edge-v1.0) \|
	\|---\|---\|---\|---\|
	\| CR \| 0.9066 \| 0.8922 \| 0.7933 \|
	\| sst2 \| 0.9154 \| 0.9198 \| 0.7577 \|
	\| sst5 \| 0.3387 \| 0.2266 \| 0.2163 \|
	\| 20_newsgroups \| 0.5577 \| 0.5189 \| 0.2555 \|
	\| spam \| 0.9790 \| 0.9380 \| 0.7609 \|
	\| financial_phrasebank \| 0.8289 \| 0.5217 \| 0.3905 \|
	\| imdb \| 0.9397 \| 0.9364 \| 0.8159 \|
	\| ag_news \| 0.7521 \| 0.6978 \| 0.6043 \|
	\| emotion \| 0.4473 \| 0.4454 \| 0.2941 \|
	\| cap_sotu \| 0.4327 \| 0.4579 \| 0.2380 \|
	\| rotten_tomatoes \| 0.8491 \| 0.8458 \| 0.5455 \|
	\| massive \| 0.5824 \| 0.4757 \| 0.2090 \|
	\| banking \| 0.6987 \| 0.6072 \| 0.4635 \|
	\| snips \| 0.8509 \| 0.6515 \| 0.5461 \|
	\| AVERAGE \| 0.7199 \| 0.6525 \| 0.4922 \|

	GLiClass-V3:

	\| Dataset \| [large‑v3.0](https://huggingface.co/knowledgator/gliclass-large-v3.0) \| [base‑v3.0](https://huggingface.co/knowledgator/gliclass-base-v3.0) \| [modern‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-large-v3.0) \| [modern‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-base-v3.0) \| [edge‑v3.0](https://huggingface.co/knowledgator/gliclass-edge-v3.0) \|
	\|---\|---\|---\|---\|---\|---\|
	\| CR \| 0.9398 \| 0.9127 \| 0.8952 \| 0.8902 \| 0.8215 \|
	\| sst2 \| 0.9192 \| 0.8959 \| 0.9330 \| 0.8959 \| 0.8199 \|
	\| sst5 \| 0.4606 \| 0.3376 \| 0.4619 \| 0.2756 \| 0.2823 \|
	\| 20_newsgroups \| 0.5958 \| 0.4759 \| 0.3905 \| 0.3433 \| 0.2217 \|
	\| spam \| 0.7584 \| 0.6760 \| 0.5813 \| 0.6398 \| 0.5623 \|
	\| financial_phrasebank \| 0.9000 \| 0.8971 \| 0.5929 \| 0.4200 \| 0.5004 \|
	\| imdb \| 0.9366 \| 0.9251 \| 0.9402 \| 0.9158 \| 0.8485 \|
	\| ag_news \| 0.7181 \| 0.7279 \| 0.7269 \| 0.6663 \| 0.6645 \|
	\| emotion \| 0.4506 \| 0.4447 \| 0.4517 \| 0.4254 \| 0.3851 \|
	\| cap_sotu \| 0.4589 \| 0.4614 \| 0.4072 \| 0.3625 \| 0.2583 \|
	\| rotten_tomatoes \| 0.8411 \| 0.7943 \| 0.7664 \| 0.7070 \| 0.7024 \|
	\| massive \| 0.5649 \| 0.5040 \| 0.3905 \| 0.3442 \| 0.2414 \|
	\| banking \| 0.5574 \| 0.4698 \| 0.3683 \| 0.3561 \| 0.0272 \|
	\| snips \| 0.9692 \| 0.9474 \| 0.7707 \| 0.5663 \| 0.5257 \|
	\| AVERAGE \| 0.7193 \| 0.6764 \| 0.6197 \| 0.5577 \| 0.4900 \|

	## Citation

	```bibtex
	@misc{stepanov2025gliclassgeneralistlightweightmodel,
	title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks},
	author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
	year={2025},
	eprint={2508.07662},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2508.07662},
	}
	```