Papers
arxiv:2605.05277

GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy

Published on May 6
Authors:
,
,

Abstract

GLiNER Guard (GLiGuard) presents a unified encoder approach for safety moderation and PII detection that achieves high throughput and competitive accuracy under latency constraints.

AI-generated summary

Production LLM systems require both safety moderation and PII detection under strict latency and cost constraints. This creates a trade-off: autoregressive moderators are accurate but expensive, while lightweight encoders are faster but less capable. We present GLiNER Guard (GLiGuard), a unified encoder that performs safety classification and PII detection in a single forward pass, simplifying safety pipelines. We introduce three variants: compact uni- and bi-encoders (145-147M) for high-throughput serving, and GLiGuard Omni (209M) for stronger moderation quality. Under dynamic batching on a single A100, the compact model reaches 193 requests/sec with P99 latency below 1s, achieving 1.6x higher throughput than GLiNER2. Omni remains competitive with much larger moderators on public safety benchmarks. We also release PII-Bench, a span-level benchmark for evaluating PII detection in end-to-end pipelines. Overall, encoder-based guardrails offer a practical low-cost alternative for always-on moderation. Models and benchmarks are released on HuggingFace.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.05277
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.05277 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 1