ScopeGuard — A Governance SLM for Multilingual Scope Classification
ScopeGuard is a small language model (SLM) designed for a specific aspect of AI governance: multilingual scope classification.
Instead of optimizing for open-ended generation, ScopeGuard is trained to make reliable, consistent, low-latency, policy-driven decisions. Being small compared to large language models (LLMs) is not a limitation, but an intentional design choice: ScopeGuard is built to be cheaper, faster, and easier to deploy as an inline decision layer in production systems.
Resources
- Quickstart with Colab: Link to a quickstart Colab notebook
- Use ScopeGuard with Orbitals: Link to our GitHub repo
- Learn more about ScopeGuard: Link to our blog article
What ScopeGuard does
Given a user request, ScopeGuard decides whether the request is:
- within scope of an AI service (supported use cases, allowed domains, interaction boundaries), or
- out of scope / restricted, so it should be blocked, redirected, or routed elsewhere.
ScopeGuard can be used as an early gate in enterprise AI deployments to ensure requests are handled only when they match the intended purpose of the system.
Show me the code
Install
pip install orbitals[scope-guard-vllm]
# Or, if you'd like to use hf as a backend
# pip install orbitals[scope-guard-hf]
Use
from orbitals.scope_guard import ScopeGuard
sg = ScopeGuard(backend="vllm", model="principled-intelligence/scope-guard-4B-g-2601")
ai_service_description = """
You are a virtual assistant for a parcel delivery service.
You can only answer questions about package tracking.
Never respond to requests for refunds.
"""
user_query = "If the package hasn't arrived by tomorrow, can I get my money back?"
result = sg.validate(user_query, ai_service_description)
print(f"Scope: {result.scope_class.value}")
if result.evidences:
print("Evidences:")
for evidence in result.evidences:
print(f" - {evidence}")
# Scope: Restricted
# Evidences:
# - Never respond to requests for refunds.
Structured AI Service Description (Suggested)
from orbitals.types import AIServiceDescription
from orbitals.scope_guard import ScopeGuard
sg = ScopeGuard(backend="vllm", model="principled-intelligence/scope-guard-4B-g-2601")
ai_service_description_complete = AIServiceDescription(
identity_role=(
"You are PackAssist, a virtual assistant designed to help users understand and "
"track their parcel shipments. Your objective is to interpret tracking data and "
"guide users through delivery-related questions."
),
context=(
"The service operates within a parcel-delivery environment where users interact "
"to check the status of shipments sent domestically or internationally. Typical "
"users are customers awaiting deliveries or sending parcels."
),
functionalities=(
"Retrieve tracking updates; explain the meaning of tracking events; provide "
"estimated delivery windows; assist users in understanding delays or routing steps."
),
knowledge_scope=(
"Public tracking information, standard logistics workflows, typical transit times, "
"and general procedures for parcel movement. No access to payment, refund, or claim "
"processing systems."
),
principles=(
"Cannot modify shipments, initiate refunds, open claims, contact drivers, or view "
"internal logistics notes. Limited strictly to interpreting publicly available "
"tracking data."
),
website_url="https://www.trackmate-delivery.com",
)
user_query = "If the package hasn't arrived by tomorrow, can I get my money back?"
result = sg.validate(user_query, ai_service_description)
print(f"Scope: {result.scope_class.value}")
if result.evidences:
print("Evidences:")
for evidence in result.evidences:
print(f" - {evidence}")
# Scope: Restricted
# Evidences:
# - No access to payment, refund, or claim processing systems.
# - Cannot modify shipments, initiate refunds, open claims, contact drivers, or view internal logistics notes.
ScopeGuard model family
Our initial family of ScopeGuard models includes:
scope-guard-4B-q-2601 Open ScopeGuard model based on Qwen3-4B-Instruct-2507 (distilled version of scope-guard-pro-2601)
scope-guard-4B-g-2601 Open ScopeGuard model based on gemma-3-4b-it (distilled version of scope-guard-pro-2601)
scope-guard-pro-2601 (Coming soon!)
Closed-source model, representing the most performant configuration used as a reference point in our evaluation
The open-weight models are distilled from the closed model.
How we evaluated ScopeGuard
We evaluated ScopeGuard in realistic guardrailing scenarios, comparing it against:
- Commercial frontier LLMs: GPT-5.2 (medium thinking), Claude Sonnet 4.5, Gemini 3 Pro
- Commercial “fast” variants: GPT-5 Mini, Claude Haiku 4.5, Gemini 3 Flash
- Open-weight safety baseline: NVIDIA Nemotron-Content-Safety-Reasoning-4B
The goal is not to outperform LLMs at general-purpose capabilities, but to show that small specialized models can outperform general systems on governance tasks, while offering strong deployment advantages.
Tasks
Our evaluation covers three governance-critical tasks:
Multilingual Scope Classification
Decide whether a user request falls inside or outside the intended scope of an AI system.Vanilla Safety Classification
Detect whether a user request violates generic safety policies (toxicity, discrimination, abusive language).Custom Safety Classification
Identify whether a user request violates explicit custom-defined policies, varying across products, services, and organizational constraints.
Experimental Results
1) Multilingual scope classification (primary task)
We created an internal multilingual scope classification benchmark reflecting real user traffic and decision boundaries, covering 5 languages:
English, Spanish, Italian, French, German.
Key result: both open-weight ScopeGuard models outperform frontier commercial LLMs on multilingual scope classification.
| Provider | Model | Type | Macro F1 | Avg. latency |
|---|---|---|---|---|
| OpenAI | GPT-5.2 (medium thinking) | Closed LLM | 87.4 | 51.1s |
| Anthropic | Claude Sonnet 4.5 | Closed LLM | 85.4 | 27.8s |
| Gemini 3 Pro | Closed LLM | 88.4 | 31.8s | |
| OpenAI | GPT-5 Mini | Closed LLM | 86.6 | 23.8s |
| Anthropic | Claude Haiku 4.5 | Closed LLM | 75.3 | 19.8s |
| Gemini 3 Flash | Closed LLM | 87.1 | 12.5s | |
| Principled Intelligence | scope-guard-4B-q-2601 | Open SLM | 89.1 | 0.47s |
| Principled Intelligence | scope-guard-4B-g-2601 | Open SLM | 90.1 | 0.68s |
| Principled Intelligence | scope-guard-pro-2601 | Proprietary SLM | 91.9 | 0.23s |
ScopeGuard models surpass all frontier LLMs on in this multilingual Scope Classification task, while being way faster on single-request latency.
2) Vanilla safety classification (Toxic Chat)
We evaluate vanilla safety using the Toxic Chat benchmark. To keep the comparison realistic, we use simple prompts and rely on each model’s default safety behavior (no extra custom structure or policies injected).
| Provider | Model | Type | Harmful F1 |
|---|---|---|---|
| OpenAI | GPT-5.2 (medium thinking) | Closed LLM | 80.8 |
| Anthropic | Claude Sonnet 4.5 | Closed LLM | 80.7 |
| Gemini 3 Pro | Closed LLM | 81.2 | |
| OpenAI | GPT-5 Mini | Closed LLM | 78.8 |
| Anthropic | Claude Haiku 4.5 | Closed LLM | 77.8 |
| Gemini 3 Flash | Closed LLM | 80.2 | |
| NVIDIA | Nemotron-Content-Safety-Reasoning-4B | Open SLM | 75.9 |
| Principled Intelligence | scope-guard-4B-q-2601 | Open SLM | 79.1 |
| Principled Intelligence | scope-guard-4B-g-2601 | Open SLM | 78.0 |
| Principled Intelligence | scope-guard-pro-2601 | Proprietary SLM | 81.8 |
Key result: open ScopeGuard models remains competitive with frontier LLMs even in generic moderation settings, and scope-guard-pro-2601 slightly surpasses all commercial models on Toxic Chat.
3) Custom safety classification (DynaGuardrail + CoSApien)
Custom safety evaluates whether models can enforce explicit user-defined policies (common in enterprise deployments), rather than generic safety heuristics.
We evaluate custom safety using:
- DynaGuardrail (dynamic policy constraints across diverse scenarios)
- CoSApien (structured safety policies embedded in the dataset)
In both benchmarks, policies are provided as part of the setup, and models are framed as general-purpose assistants, so the evaluation measures policy application and constraint following.
| Provider | Model | Type | DynaGuardrail | CoSApien |
|---|---|---|---|---|
| OpenAI | GPT-5.2 (medium thinking) | Closed LLM | 89.5 | 91.3 |
| Anthropic | Claude Sonnet 4.5 | Closed LLM | 88.3 | 90.9 |
| Gemini 3 Pro | Closed LLM | 87.8 | 90.0 | |
| OpenAI | GPT-5 Mini | Closed LLM | 88.4 | 87.0 |
| Anthropic | Claude Haiku 4.5 | Closed LLM | 84.4 | 89.5 |
| Gemini 3 Flash | Closed LLM | 88.2 | 88.8 | |
| NVIDIA | Nemotron-Content-Safety-Reasoning-4B | Open SLM | 87.6 | 86.2 |
| Principled Intelligence | scope-guard-4B-q-2601 | Open SLM | 88.7 | 91.9 |
| Principled Intelligence | scope-guard-4B-g-2601 | Open SLM | 87.8 | 88.2 |
| Principled Intelligence | scope-guard-pro-2601 | Proprietary SLM | 91.6 | 92.4 |
Key result: ScopeGuard shows a clear advantage when policies are complex, dynamic, or tightly specified, confirming that custom safety is not just an extension of vanilla moderation.
Inference speed and deployment
To enforce AI governance in production, guardrails are executed on every user interaction, making latency and throughput critical.
To keep results realistic and actionable, we measured ScopeGuard inference on a single consumer-grade GPU (RTX 4090), which is widely available on cloud marketplaces (e.g., Vast.ai) for ~€0.20/hour. (You can obtain good results even with L4-like GPUs, available on GCP and other marcketplaces for ~$0.70/h, with significant discount opportunities)
In this setting, ScopeGuard consistently achieves sub-second latency, making it suitable for inline deployment as a real-time decision layer, without requiring expensive proprietary infrastructure or introducing noticeable delays.
Intended use
ScopeGuard is intended for:
- Scope enforcement for customer-facing assistants (route/deny out-of-scope queries)
- Inline guardrailing for agentic systems (pre-checks before tool execution)
- Enterprise governance layers where behavior must follow explicit boundaries and policies
- Analytics & routing pipelines where explainable classification supports monitoring and reporting
Limitations
- ScopeGuard is designed for governance decisions, not for open-ended generation.
- Performance on safety benchmarks can be sensitive to prompting and evaluation conditions; strong results on one benchmark may not transfer automatically to all setups.
Cite ScopeGuard
If you use this model in academic work or evaluations, we would love if you cited ScopeGuard (our tech report is coming soon!).
Want to integrate ScopeGuard to safeguard your AI? Get in touch!
If you are thinking about integrating ScopeGuard into your pipeline to safeguard your AI and your systems, we can help you! Contact us directly or write to orbitals@principled-intelligence.com to learn more about ScopeGuard Pro or how we can support you.
Built with ❤️ by Principled Intelligence
- Downloads last month
- 29
Model tree for principled-intelligence/scope-guard-4B-g-2601
Base model
google/gemma-3-4b-pt