Update README.md
Browse files
README.md
CHANGED
|
@@ -18,11 +18,17 @@ tags:
|
|
| 18 |
- token-classification
|
| 19 |
- guardrails
|
| 20 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Installation
|
| 23 |
-
Install dependencies
|
|
|
|
| 24 |
```bash
|
| 25 |
-
pip install "gliner2 @ git+https://github.com/bogdanminko/GLiNER2.git@feature/bi-encoder"
|
| 26 |
```
|
| 27 |
## Usage
|
| 28 |
Classify Harmful messages and Detect PII via single forward pass
|
|
@@ -52,4 +58,163 @@ output:
|
|
| 52 |
'email': ['john.smith@gmail.com'],
|
| 53 |
'phone': []},
|
| 54 |
'safety': 'unsafe'}
|
| 55 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
- token-classification
|
| 19 |
- guardrails
|
| 20 |
---
|
| 21 |
+
# GLiNER Guard — Unified Multitask Guardrail
|
| 22 |
+
One encoder model that replaces your entire guardrail stack: safety classification, PII detection, adversarial attack detection, intent and tone analysis — all in a single forward pass.
|
| 23 |
+

|
| 24 |
+
|
| 25 |
+
**147M params · GLiNER2 · biencoder · modernbert multilingual · zero-shot classification, NER and more · no LLM required**
|
| 26 |
|
| 27 |
## Installation
|
| 28 |
+
Install dependencies\
|
| 29 |
+
(now via our fork, wi'll update installation part after PR to GLiNER2 repo)
|
| 30 |
```bash
|
| 31 |
+
pip install "gliner2 @ git+https://github.com/bogdanminko/GLiNER2.git@feature/bi-encoder"
|
| 32 |
```
|
| 33 |
## Usage
|
| 34 |
Classify Harmful messages and Detect PII via single forward pass
|
|
|
|
| 58 |
'email': ['john.smith@gmail.com'],
|
| 59 |
'phone': []},
|
| 60 |
'safety': 'unsafe'}
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
## Supported Tasks
|
| 64 |
+
|
| 65 |
+
GLiNER Guard is purpose-built for 6 guardrail tasks via a shared encoder — no LLM required.\
|
| 66 |
+
Thanks to zero-shot generalization, it can also handle custom labels outside the training taxonomy.
|
| 67 |
+
|
| 68 |
+
| Task | Type | Labels | Key Labels |
|
| 69 |
+
|------|------|--------|------------|
|
| 70 |
+
| **Safety** | single-label | 2 | `safe` `unsafe` |
|
| 71 |
+
| **PII / NER** | span extraction | 32 | `person` `email` `phone` `card_number` `address` |
|
| 72 |
+
| **Adversarial Detection** | multi-label | 15 | `jailbreak_persona` `prompt_injection` `instruction_override` `data_exfiltration` |
|
| 73 |
+
| **Harmful Content** | multi-label | 30 | `hate_speech` `violence` `child_exploitation` `fraud` `pii_exposure` |
|
| 74 |
+
| **Intent** | single-label | 13 | `informational` `adversarial` `threatening` `solicitation` |
|
| 75 |
+
| **Tone of Voice** | single-label | 10 | `neutral` `aggressive` `manipulative` `deceptive` |
|
| 76 |
+
|
| 77 |
+
<details>
|
| 78 |
+
<summary><b>Safety</b> — all 2 labels</summary>
|
| 79 |
+
|
| 80 |
+
Classifies whether a message is safe or unsafe. Single-label.
|
| 81 |
+
```python
|
| 82 |
+
SAFETY_LABELS = ["safe", "unsafe"]
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
| Label | Description |
|
| 86 |
+
|-------|-------------|
|
| 87 |
+
| `safe` | Message does not contain harmful or policy-violating content |
|
| 88 |
+
| `unsafe` | Message contains harmful, dangerous, or policy-violating content |
|
| 89 |
+
|
| 90 |
+
</details>
|
| 91 |
+
|
| 92 |
+
<details>
|
| 93 |
+
<summary><b>NER / PII</b> — all 32 entity types</summary>
|
| 94 |
+
|
| 95 |
+
Span extraction across 7 groups. Use labels from this list for best results — out-of-taxonomy labels may work via zero-shot generalization but are not benchmarked.
|
| 96 |
+
|
| 97 |
+
| Group | Labels |
|
| 98 |
+
|-------|--------|
|
| 99 |
+
| **Person** | `person` `first_name` `last_name` `alias` `title` |
|
| 100 |
+
| **Location** | `country` `region` `city` `district` `street` `building` `unit` `postal_code` `landmark` `address` |
|
| 101 |
+
| **Organization** | `company` `government` `education` `media` `product` |
|
| 102 |
+
| **Contact** | `email` `phone` `social_account` `messenger` |
|
| 103 |
+
| **Identity** | `passport` `national_id` `document_id` |
|
| 104 |
+
| **Temporal** | `date_of_birth` `event_date` |
|
| 105 |
+
| **Financial** | `card_number` `bank_account` `crypto_wallet` |
|
| 106 |
+
```python
|
| 107 |
+
PII_LABELS = [
|
| 108 |
+
"person", "first_name", "last_name", "alias", "title",
|
| 109 |
+
"country", "region", "city", "district", "street",
|
| 110 |
+
"building", "unit", "postal_code", "landmark", "address",
|
| 111 |
+
"company", "government", "education", "media", "product",
|
| 112 |
+
"email", "phone", "social_account", "messenger",
|
| 113 |
+
"passport", "national_id", "document_id",
|
| 114 |
+
"date_of_birth", "event_date",
|
| 115 |
+
"card_number", "bank_account", "crypto_wallet",
|
| 116 |
+
]
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
</details>
|
| 120 |
+
|
| 121 |
+
<details>
|
| 122 |
+
<summary><b>Adversarial Detection</b> — all 15 labels</summary>
|
| 123 |
+
|
| 124 |
+
Detects attacks against LLM-based systems. Multi-label: a single message can combine multiple attack vectors.
|
| 125 |
+
|
| 126 |
+
| Subgroup | Labels |
|
| 127 |
+
|----------|--------|
|
| 128 |
+
| **Jailbreak** | `jailbreak_persona` `jailbreak_hypothetical` `jailbreak_roleplay` |
|
| 129 |
+
| **Injection** | `prompt_injection` `indirect_prompt_injection` `instruction_override` |
|
| 130 |
+
| **Extraction** | `data_exfiltration` `system_prompt_extraction` `context_manipulation` `token_manipulation` |
|
| 131 |
+
| **Advanced** | `tool_abuse` `social_engineering` `multi_turn_escalation` `schema_poisoning` |
|
| 132 |
+
| **Clean** | `none` |
|
| 133 |
+
```python
|
| 134 |
+
ADVERSARIAL_LABELS = [
|
| 135 |
+
"jailbreak_persona", "jailbreak_hypothetical", "jailbreak_roleplay",
|
| 136 |
+
"prompt_injection", "indirect_prompt_injection", "instruction_override",
|
| 137 |
+
"data_exfiltration", "system_prompt_extraction", "context_manipulation", "token_manipulation",
|
| 138 |
+
"tool_abuse", "social_engineering", "multi_turn_escalation", "schema_poisoning",
|
| 139 |
+
"none",
|
| 140 |
+
]
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
</details>
|
| 144 |
+
|
| 145 |
+
<details>
|
| 146 |
+
<summary><b>Harmful Content</b> — all 30 labels</summary>
|
| 147 |
+
|
| 148 |
+
Detects harmful content categories. Multi-label: a message can belong to multiple categories simultaneously.
|
| 149 |
+
|
| 150 |
+
| Subgroup | Labels |
|
| 151 |
+
|----------|--------|
|
| 152 |
+
| **Interpersonal** | `harassment` `hate_speech` `discrimination` `doxxing` `bullying` |
|
| 153 |
+
| **Violence & Danger** | `violence` `dangerous_instructions` `weapons` `drugs` `self_harm` |
|
| 154 |
+
| **Sexual & Exploitation** | `sexual_content` `child_exploitation` `grooming` `sextortion` |
|
| 155 |
+
| **Deception** | `fraud` `scam` `social_engineering` `impersonation` |
|
| 156 |
+
| **Sensitive Topics** | `profanity` `extremism` `political` `war` `espionage` `cybersecurity` `religious` `lgbt` |
|
| 157 |
+
| **Information** | `misinformation` `copyright_violation` `pii_exposure` |
|
| 158 |
+
| **Clean** | `none` |
|
| 159 |
+
```python
|
| 160 |
+
HARMFUL_LABELS = [
|
| 161 |
+
"harassment", "hate_speech", "discrimination", "doxxing", "bullying",
|
| 162 |
+
"violence", "dangerous_instructions", "weapons", "drugs", "self_harm",
|
| 163 |
+
"sexual_content", "child_exploitation", "grooming", "sextortion",
|
| 164 |
+
"fraud", "scam", "social_engineering", "impersonation",
|
| 165 |
+
"profanity", "extremism", "political", "war", "espionage", "cybersecurity", "religious", "lgbt",
|
| 166 |
+
"misinformation", "copyright_violation", "pii_exposure",
|
| 167 |
+
"none",
|
| 168 |
+
]
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
</details>
|
| 172 |
+
|
| 173 |
+
<details>
|
| 174 |
+
<summary><b>Intent</b> — all 13 labels</summary>
|
| 175 |
+
|
| 176 |
+
Classifies the intent behind a message. Single-label.
|
| 177 |
+
|
| 178 |
+
| Labels | |
|
| 179 |
+
|--------|--|
|
| 180 |
+
| Benign | `informational` `instructional` `conversational` `persuasive` `creative` `transactional` `emotional_support` `testing` |
|
| 181 |
+
| Ambiguous | `ambiguous` `extractive` |
|
| 182 |
+
| Malicious | `adversarial` `threatening` `solicitation` |
|
| 183 |
+
```python
|
| 184 |
+
INTENT_LABELS = [
|
| 185 |
+
"informational", "instructional", "conversational", "persuasive",
|
| 186 |
+
"creative", "transactional", "emotional_support", "testing",
|
| 187 |
+
"ambiguous", "extractive",
|
| 188 |
+
"adversarial", "threatening", "solicitation",
|
| 189 |
+
]
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
</details>
|
| 193 |
+
|
| 194 |
+
<details>
|
| 195 |
+
<summary><b>Tone of Voice</b> — all 10 labels</summary>
|
| 196 |
+
|
| 197 |
+
Classifies the tone of a message. Single-label.
|
| 198 |
+
|
| 199 |
+
| Label | Description |
|
| 200 |
+
|-------|-------------|
|
| 201 |
+
| `neutral` | Matter-of-fact, no strong emotional coloring |
|
| 202 |
+
| `formal` | Professional or official register |
|
| 203 |
+
| `humorous` | Playful, joking, or light-hearted |
|
| 204 |
+
| `sarcastic` | Ironic or mocking tone |
|
| 205 |
+
| `distressed` | Anxious, upset, or overwhelmed |
|
| 206 |
+
| `confused` | Unclear intent, disoriented phrasing |
|
| 207 |
+
| `pleading` | Urgent requests, begging for help or compliance |
|
| 208 |
+
| `aggressive` | Hostile, confrontational, or threatening |
|
| 209 |
+
| `manipulative` | Attempts to exploit, deceive, or coerce |
|
| 210 |
+
| `deceptive` | Deliberately misleading or false framing |
|
| 211 |
+
```python
|
| 212 |
+
TOV_LABELS = [
|
| 213 |
+
"neutral", "formal", "humorous", "sarcastic",
|
| 214 |
+
"distressed", "confused", "pleading",
|
| 215 |
+
"aggressive", "manipulative", "deceptive",
|
| 216 |
+
]
|
| 217 |
+
```
|
| 218 |
+
|
| 219 |
+
</details>
|
| 220 |
+
</details>
|