bogdanminko commited on
Commit
899f1c8
·
verified ·
1 Parent(s): 08d5ddd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -3
README.md CHANGED
@@ -18,11 +18,17 @@ tags:
18
  - token-classification
19
  - guardrails
20
  ---
 
 
 
 
 
21
 
22
  ## Installation
23
- Install dependencies (now via our fork, wi'll update installation part after PR to GLiNER2 repo)
 
24
  ```bash
25
- pip install "gliner2 @ git+https://github.com/bogdanminko/GLiNER2.git@feature/bi-encoder" torch transformers
26
  ```
27
  ## Usage
28
  Classify Harmful messages and Detect PII via single forward pass
@@ -52,4 +58,163 @@ output:
52
  'email': ['john.smith@gmail.com'],
53
  'phone': []},
54
  'safety': 'unsafe'}
55
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - token-classification
19
  - guardrails
20
  ---
21
+ # GLiNER Guard — Unified Multitask Guardrail
22
+ One encoder model that replaces your entire guardrail stack: safety classification, PII detection, adversarial attack detection, intent and tone analysis — all in a single forward pass.
23
+ ![GLiNER Guard architecture](biencoder.png)
24
+
25
+ **147M params · GLiNER2 · biencoder · modernbert multilingual · zero-shot classification, NER and more · no LLM required**
26
 
27
  ## Installation
28
+ Install dependencies\
29
+ (now via our fork, wi'll update installation part after PR to GLiNER2 repo)
30
  ```bash
31
+ pip install "gliner2 @ git+https://github.com/bogdanminko/GLiNER2.git@feature/bi-encoder"
32
  ```
33
  ## Usage
34
  Classify Harmful messages and Detect PII via single forward pass
 
58
  'email': ['john.smith@gmail.com'],
59
  'phone': []},
60
  'safety': 'unsafe'}
61
+ ```
62
+
63
+ ## Supported Tasks
64
+
65
+ GLiNER Guard is purpose-built for 6 guardrail tasks via a shared encoder — no LLM required.\
66
+ Thanks to zero-shot generalization, it can also handle custom labels outside the training taxonomy.
67
+
68
+ | Task | Type | Labels | Key Labels |
69
+ |------|------|--------|------------|
70
+ | **Safety** | single-label | 2 | `safe` `unsafe` |
71
+ | **PII / NER** | span extraction | 32 | `person` `email` `phone` `card_number` `address` |
72
+ | **Adversarial Detection** | multi-label | 15 | `jailbreak_persona` `prompt_injection` `instruction_override` `data_exfiltration` |
73
+ | **Harmful Content** | multi-label | 30 | `hate_speech` `violence` `child_exploitation` `fraud` `pii_exposure` |
74
+ | **Intent** | single-label | 13 | `informational` `adversarial` `threatening` `solicitation` |
75
+ | **Tone of Voice** | single-label | 10 | `neutral` `aggressive` `manipulative` `deceptive` |
76
+
77
+ <details>
78
+ <summary><b>Safety</b> — all 2 labels</summary>
79
+
80
+ Classifies whether a message is safe or unsafe. Single-label.
81
+ ```python
82
+ SAFETY_LABELS = ["safe", "unsafe"]
83
+ ```
84
+
85
+ | Label | Description |
86
+ |-------|-------------|
87
+ | `safe` | Message does not contain harmful or policy-violating content |
88
+ | `unsafe` | Message contains harmful, dangerous, or policy-violating content |
89
+
90
+ </details>
91
+
92
+ <details>
93
+ <summary><b>NER / PII</b> — all 32 entity types</summary>
94
+
95
+ Span extraction across 7 groups. Use labels from this list for best results — out-of-taxonomy labels may work via zero-shot generalization but are not benchmarked.
96
+
97
+ | Group | Labels |
98
+ |-------|--------|
99
+ | **Person** | `person` `first_name` `last_name` `alias` `title` |
100
+ | **Location** | `country` `region` `city` `district` `street` `building` `unit` `postal_code` `landmark` `address` |
101
+ | **Organization** | `company` `government` `education` `media` `product` |
102
+ | **Contact** | `email` `phone` `social_account` `messenger` |
103
+ | **Identity** | `passport` `national_id` `document_id` |
104
+ | **Temporal** | `date_of_birth` `event_date` |
105
+ | **Financial** | `card_number` `bank_account` `crypto_wallet` |
106
+ ```python
107
+ PII_LABELS = [
108
+ "person", "first_name", "last_name", "alias", "title",
109
+ "country", "region", "city", "district", "street",
110
+ "building", "unit", "postal_code", "landmark", "address",
111
+ "company", "government", "education", "media", "product",
112
+ "email", "phone", "social_account", "messenger",
113
+ "passport", "national_id", "document_id",
114
+ "date_of_birth", "event_date",
115
+ "card_number", "bank_account", "crypto_wallet",
116
+ ]
117
+ ```
118
+
119
+ </details>
120
+
121
+ <details>
122
+ <summary><b>Adversarial Detection</b> — all 15 labels</summary>
123
+
124
+ Detects attacks against LLM-based systems. Multi-label: a single message can combine multiple attack vectors.
125
+
126
+ | Subgroup | Labels |
127
+ |----------|--------|
128
+ | **Jailbreak** | `jailbreak_persona` `jailbreak_hypothetical` `jailbreak_roleplay` |
129
+ | **Injection** | `prompt_injection` `indirect_prompt_injection` `instruction_override` |
130
+ | **Extraction** | `data_exfiltration` `system_prompt_extraction` `context_manipulation` `token_manipulation` |
131
+ | **Advanced** | `tool_abuse` `social_engineering` `multi_turn_escalation` `schema_poisoning` |
132
+ | **Clean** | `none` |
133
+ ```python
134
+ ADVERSARIAL_LABELS = [
135
+ "jailbreak_persona", "jailbreak_hypothetical", "jailbreak_roleplay",
136
+ "prompt_injection", "indirect_prompt_injection", "instruction_override",
137
+ "data_exfiltration", "system_prompt_extraction", "context_manipulation", "token_manipulation",
138
+ "tool_abuse", "social_engineering", "multi_turn_escalation", "schema_poisoning",
139
+ "none",
140
+ ]
141
+ ```
142
+
143
+ </details>
144
+
145
+ <details>
146
+ <summary><b>Harmful Content</b> — all 30 labels</summary>
147
+
148
+ Detects harmful content categories. Multi-label: a message can belong to multiple categories simultaneously.
149
+
150
+ | Subgroup | Labels |
151
+ |----------|--------|
152
+ | **Interpersonal** | `harassment` `hate_speech` `discrimination` `doxxing` `bullying` |
153
+ | **Violence & Danger** | `violence` `dangerous_instructions` `weapons` `drugs` `self_harm` |
154
+ | **Sexual & Exploitation** | `sexual_content` `child_exploitation` `grooming` `sextortion` |
155
+ | **Deception** | `fraud` `scam` `social_engineering` `impersonation` |
156
+ | **Sensitive Topics** | `profanity` `extremism` `political` `war` `espionage` `cybersecurity` `religious` `lgbt` |
157
+ | **Information** | `misinformation` `copyright_violation` `pii_exposure` |
158
+ | **Clean** | `none` |
159
+ ```python
160
+ HARMFUL_LABELS = [
161
+ "harassment", "hate_speech", "discrimination", "doxxing", "bullying",
162
+ "violence", "dangerous_instructions", "weapons", "drugs", "self_harm",
163
+ "sexual_content", "child_exploitation", "grooming", "sextortion",
164
+ "fraud", "scam", "social_engineering", "impersonation",
165
+ "profanity", "extremism", "political", "war", "espionage", "cybersecurity", "religious", "lgbt",
166
+ "misinformation", "copyright_violation", "pii_exposure",
167
+ "none",
168
+ ]
169
+ ```
170
+
171
+ </details>
172
+
173
+ <details>
174
+ <summary><b>Intent</b> — all 13 labels</summary>
175
+
176
+ Classifies the intent behind a message. Single-label.
177
+
178
+ | Labels | |
179
+ |--------|--|
180
+ | Benign | `informational` `instructional` `conversational` `persuasive` `creative` `transactional` `emotional_support` `testing` |
181
+ | Ambiguous | `ambiguous` `extractive` |
182
+ | Malicious | `adversarial` `threatening` `solicitation` |
183
+ ```python
184
+ INTENT_LABELS = [
185
+ "informational", "instructional", "conversational", "persuasive",
186
+ "creative", "transactional", "emotional_support", "testing",
187
+ "ambiguous", "extractive",
188
+ "adversarial", "threatening", "solicitation",
189
+ ]
190
+ ```
191
+
192
+ </details>
193
+
194
+ <details>
195
+ <summary><b>Tone of Voice</b> — all 10 labels</summary>
196
+
197
+ Classifies the tone of a message. Single-label.
198
+
199
+ | Label | Description |
200
+ |-------|-------------|
201
+ | `neutral` | Matter-of-fact, no strong emotional coloring |
202
+ | `formal` | Professional or official register |
203
+ | `humorous` | Playful, joking, or light-hearted |
204
+ | `sarcastic` | Ironic or mocking tone |
205
+ | `distressed` | Anxious, upset, or overwhelmed |
206
+ | `confused` | Unclear intent, disoriented phrasing |
207
+ | `pleading` | Urgent requests, begging for help or compliance |
208
+ | `aggressive` | Hostile, confrontational, or threatening |
209
+ | `manipulative` | Attempts to exploit, deceive, or coerce |
210
+ | `deceptive` | Deliberately misleading or false framing |
211
+ ```python
212
+ TOV_LABELS = [
213
+ "neutral", "formal", "humorous", "sarcastic",
214
+ "distressed", "confused", "pleading",
215
+ "aggressive", "manipulative", "deceptive",
216
+ ]
217
+ ```
218
+
219
+ </details>
220
+ </details>