ScopeGuard — A Governance SLM for Multilingual Scope Classification

ScopeGuard is a small language model (SLM) designed for a specific aspect of AI governance: multilingual scope classification.

Instead of optimizing for open-ended generation, ScopeGuard is trained to make reliable, consistent, low-latency, policy-driven decisions. Being small compared to large language models (LLMs) is not a limitation, but an intentional design choice: ScopeGuard is built to be cheaper, faster, and easier to deploy as an inline decision layer in production systems.

end2end (2).svg

Resources

What ScopeGuard does

Given a user request, ScopeGuard decides whether the request is:

  • within scope of an AI service (supported use cases, allowed domains, interaction boundaries), or
  • out of scope / restricted, so it should be blocked, redirected, or routed elsewhere.

ScopeGuard can be used as an early gate in enterprise AI deployments to ensure requests are handled only when they match the intended purpose of the system.

scope-classifier-expanded-bg.svg


Show me the code

Install

pip install orbitals[scope-guard-vllm]

# Or, if you'd like to use hf as a backend
# pip install orbitals[scope-guard-hf]

Use

from orbitals.scope_guard import ScopeGuard

sg = ScopeGuard(backend="vllm", model="principled-intelligence/scope-guard-4B-g-2601")

ai_service_description = """
You are a virtual assistant for a parcel delivery service.
You can only answer questions about package tracking.
Never respond to requests for refunds.
"""

user_query = "If the package hasn't arrived by tomorrow, can I get my money back?"
result = sg.validate(user_query, ai_service_description)

print(f"Scope: {result.scope_class.value}")
if result.evidences:
    print("Evidences:")
    for evidence in result.evidences:
        print(f"  - {evidence}")

# Scope: Restricted
# Evidences:
#   - Never respond to requests for refunds.

Structured AI Service Description (Suggested)

from orbitals.types import AIServiceDescription
from orbitals.scope_guard import ScopeGuard

sg = ScopeGuard(backend="vllm", model="principled-intelligence/scope-guard-4B-g-2601")

ai_service_description_complete = AIServiceDescription(
    identity_role=(
        "You are PackAssist, a virtual assistant designed to help users understand and "
        "track their parcel shipments. Your objective is to interpret tracking data and "
        "guide users through delivery-related questions."
    ),
    context=(
        "The service operates within a parcel-delivery environment where users interact "
        "to check the status of shipments sent domestically or internationally. Typical "
        "users are customers awaiting deliveries or sending parcels."
    ),
    functionalities=(
        "Retrieve tracking updates; explain the meaning of tracking events; provide "
        "estimated delivery windows; assist users in understanding delays or routing steps."
    ),
    knowledge_scope=(
        "Public tracking information, standard logistics workflows, typical transit times, "
        "and general procedures for parcel movement. No access to payment, refund, or claim "
        "processing systems."
    ),
    principles=(
        "Cannot modify shipments, initiate refunds, open claims, contact drivers, or view "
        "internal logistics notes. Limited strictly to interpreting publicly available "
        "tracking data."
    ),
    website_url="https://www.trackmate-delivery.com",
)

user_query = "If the package hasn't arrived by tomorrow, can I get my money back?"
result = sg.validate(user_query, ai_service_description)

print(f"Scope: {result.scope_class.value}")
if result.evidences:
    print("Evidences:")
    for evidence in result.evidences:
        print(f"  - {evidence}")

# Scope: Restricted
# Evidences:
#   - No access to payment, refund, or claim processing systems.
#   - Cannot modify shipments, initiate refunds, open claims, contact drivers, or view internal logistics notes.

ScopeGuard model family

Our initial family of ScopeGuard models includes:

  • scope-guard-4B-q-2601 Open ScopeGuard model based on Qwen3-4B-Instruct-2507 (distilled version of scope-guard-pro-2601)

  • scope-guard-4B-g-2601 Open ScopeGuard model based on gemma-3-4b-it (distilled version of scope-guard-pro-2601)

  • scope-guard-pro-2601 (Coming soon!)
    Closed-source model, representing the most performant configuration used as a reference point in our evaluation

The open-weight models are distilled from the closed model.


How we evaluated ScopeGuard

We evaluated ScopeGuard in realistic guardrailing scenarios, comparing it against:

  • Commercial frontier LLMs: GPT-5.2 (medium thinking), Claude Sonnet 4.5, Gemini 3 Pro
  • Commercial “fast” variants: GPT-5 Mini, Claude Haiku 4.5, Gemini 3 Flash
  • Open-weight safety baseline: NVIDIA Nemotron-Content-Safety-Reasoning-4B

The goal is not to outperform LLMs at general-purpose capabilities, but to show that small specialized models can outperform general systems on governance tasks, while offering strong deployment advantages.

Tasks

Our evaluation covers three governance-critical tasks:

  1. Multilingual Scope Classification
    Decide whether a user request falls inside or outside the intended scope of an AI system.

  2. Vanilla Safety Classification
    Detect whether a user request violates generic safety policies (toxicity, discrimination, abusive language).

  3. Custom Safety Classification
    Identify whether a user request violates explicit custom-defined policies, varying across products, services, and organizational constraints.


Experimental Results

1) Multilingual scope classification (primary task)

We created an internal multilingual scope classification benchmark reflecting real user traffic and decision boundaries, covering 5 languages:

English, Spanish, Italian, French, German.

Key result: both open-weight ScopeGuard models outperform frontier commercial LLMs on multilingual scope classification.

Provider Model Type Macro F1 Avg. latency
OpenAI GPT-5.2 (medium thinking) Closed LLM 87.4 51.1s
Anthropic Claude Sonnet 4.5 Closed LLM 85.4 27.8s
Google Gemini 3 Pro Closed LLM 88.4 31.8s
OpenAI GPT-5 Mini Closed LLM 86.6 23.8s
Anthropic Claude Haiku 4.5 Closed LLM 75.3 19.8s
Google Gemini 3 Flash Closed LLM 87.1 12.5s
Principled Intelligence scope-guard-4B-q-2601 Open SLM 89.1 0.47s
Principled Intelligence scope-guard-4B-g-2601 Open SLM 90.1 0.68s
Principled Intelligence scope-guard-pro-2601 Proprietary SLM 91.9 0.23s

ScopeGuard models surpass all frontier LLMs on in this multilingual Scope Classification task, while being way faster on single-request latency.


2) Vanilla safety classification (Toxic Chat)

We evaluate vanilla safety using the Toxic Chat benchmark. To keep the comparison realistic, we use simple prompts and rely on each model’s default safety behavior (no extra custom structure or policies injected).

Provider Model Type Harmful F1
OpenAI GPT-5.2 (medium thinking) Closed LLM 80.8
Anthropic Claude Sonnet 4.5 Closed LLM 80.7
Google Gemini 3 Pro Closed LLM 81.2
OpenAI GPT-5 Mini Closed LLM 78.8
Anthropic Claude Haiku 4.5 Closed LLM 77.8
Google Gemini 3 Flash Closed LLM 80.2
NVIDIA Nemotron-Content-Safety-Reasoning-4B Open SLM 75.9
Principled Intelligence scope-guard-4B-q-2601 Open SLM 79.1
Principled Intelligence scope-guard-4B-g-2601 Open SLM 78.0
Principled Intelligence scope-guard-pro-2601 Proprietary SLM 81.8

Key result: open ScopeGuard models remains competitive with frontier LLMs even in generic moderation settings, and scope-guard-pro-2601 slightly surpasses all commercial models on Toxic Chat.


3) Custom safety classification (DynaGuardrail + CoSApien)

Custom safety evaluates whether models can enforce explicit user-defined policies (common in enterprise deployments), rather than generic safety heuristics.

We evaluate custom safety using:

  • DynaGuardrail (dynamic policy constraints across diverse scenarios)
  • CoSApien (structured safety policies embedded in the dataset)

In both benchmarks, policies are provided as part of the setup, and models are framed as general-purpose assistants, so the evaluation measures policy application and constraint following.

Provider Model Type DynaGuardrail CoSApien
OpenAI GPT-5.2 (medium thinking) Closed LLM 89.5 91.3
Anthropic Claude Sonnet 4.5 Closed LLM 88.3 90.9
Google Gemini 3 Pro Closed LLM 87.8 90.0
OpenAI GPT-5 Mini Closed LLM 88.4 87.0
Anthropic Claude Haiku 4.5 Closed LLM 84.4 89.5
Google Gemini 3 Flash Closed LLM 88.2 88.8
NVIDIA Nemotron-Content-Safety-Reasoning-4B Open SLM 87.6 86.2
Principled Intelligence scope-guard-4B-q-2601 Open SLM 88.7 91.9
Principled Intelligence scope-guard-4B-g-2601 Open SLM 87.8 88.2
Principled Intelligence scope-guard-pro-2601 Proprietary SLM 91.6 92.4

Key result: ScopeGuard shows a clear advantage when policies are complex, dynamic, or tightly specified, confirming that custom safety is not just an extension of vanilla moderation.


Inference speed and deployment

To enforce AI governance in production, guardrails are executed on every user interaction, making latency and throughput critical.

To keep results realistic and actionable, we measured ScopeGuard inference on a single consumer-grade GPU (RTX 4090), which is widely available on cloud marketplaces (e.g., Vast.ai) for ~€0.20/hour. (You can obtain good results even with L4-like GPUs, available on GCP and other marcketplaces for ~$0.70/h, with significant discount opportunities)

In this setting, ScopeGuard consistently achieves sub-second latency, making it suitable for inline deployment as a real-time decision layer, without requiring expensive proprietary infrastructure or introducing noticeable delays.


Intended use

ScopeGuard is intended for:

  • Scope enforcement for customer-facing assistants (route/deny out-of-scope queries)
  • Inline guardrailing for agentic systems (pre-checks before tool execution)
  • Enterprise governance layers where behavior must follow explicit boundaries and policies
  • Analytics & routing pipelines where explainable classification supports monitoring and reporting

Limitations

  • ScopeGuard is designed for governance decisions, not for open-ended generation.
  • Performance on safety benchmarks can be sensitive to prompting and evaluation conditions; strong results on one benchmark may not transfer automatically to all setups.

Cite ScopeGuard

If you use this model in academic work or evaluations, we would love if you cited ScopeGuard (our tech report is coming soon!).


Want to integrate ScopeGuard to safeguard your AI? Get in touch!

If you are thinking about integrating ScopeGuard into your pipeline to safeguard your AI and your systems, we can help you! Contact us directly or write to orbitals@principled-intelligence.com to learn more about ScopeGuard Pro or how we can support you.


Built with ❤️ by Principled Intelligence

Downloads last month
29
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for principled-intelligence/scope-guard-4B-g-2601

Finetuned
(986)
this model

Collection including principled-intelligence/scope-guard-4B-g-2601