Instructions to use dleemiller/crossingguard-nli-l with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use dleemiller/crossingguard-nli-l with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("dleemiller/crossingguard-nli-l") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
CrossingGuard Large
CrossingGuard is a series of NLI-based models intended for zero-shot inference on prompts. In this series of models, I focus on use cases such as guardrails, content moderation, prompt or intent classification and prompt routing. Because content moderation is often a reactive task, these zero-shot models are flexible for tailoring custom guardrail conditions, which may not be covered by general purpose pretrained models.
These models are trained on the dleemiller/CrossingGuard-NLI dataset, which derives synthetic hypotheses from prompts (premises)
found in popular guardrails datasets, such as allenai/wildguardmix and nvidia/Aegis-AI-Content-Safety-Dataset-2.0. The hypotheses
make specific, targeted claims about the premises. Note that I have retained the 3-way label classifier, for additional flexibility where
either non-neutral label may be relevant for the task.
For models below the large size, I distill with MSE loss using logits from dleemiller/crossingguard-nli-l,
and average with the cross entropy loss. Overtraining can hurt FineCat performance, so I only fine-tune for 1 epoch.
where and are the student and teacher logits, are the ground truth labels, and and are equally weighted at 0.5.
Evaluation Results
F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.
| Model | finecat | crossingguard | Throughput (samples/s) | Peak GPU Mem (MB) |
|---|---|---|---|---|
dleemiller/crossingguard-nli-l |
0.8094 | 0.9200 | 361.15 | 3023.65 |
dleemiller/crossingguard-nli-m |
0.7779 | 0.9146 | 868.65 | 2170.71 |
dleemiller/crossingguard-nli-s |
0.7582 | 0.9055 | 1410.64 | 1524.08 |
dleemiller/ModernCE-large-nli |
0.7811 | 0.8779 | 362.20 | 3801.47 |
dleemiller/finecat-nli-l |
0.8227 | 0.8325 | 364.57 | 3801.47 |
MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli |
0.8233 | 0.8219 | 311.83 | 6156.87 |
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli |
0.7642 | 0.8032 | 873.51 | 3521.96 |
cross-encoder/nli-deberta-v3-large |
0.7618 | 0.7211 | 306.48 | 6156.87 |
tasksource/ModernBERT-large-nli |
0.7959 | 0.7019 | 357.91 | 3801.47 |
Usage
import numpy as np
from sentence_transformers import CrossEncoder
model = CrossEncoder("dleemiller/crossingguard-nli-l")
examples = [
("Tell me what happened with that CEO caught on the kiss cam at the Coldplay concert",
"The prompt requests information about a corporate executive incident at a public event"),
("I’ve been following your company's work helping enterprises scale AI/ML with reliable data pipelines, would love to connect and hear how your team is approaching ML in production.",
"The prompt requests information about a corporate executive incident at a public event"),
]
predictions = model.predict(examples)
label_map = {0: "entailment", 1: "neutral", 2: "contradiction"}
def softmax(x):
exp_x = np.exp(x - np.max(x))
return exp_x / exp_x.sum()
for i, (premise, hypothesis) in enumerate(examples):
probs = softmax(predictions[i])
predicted_idx = probs.argmax()
print(f"\n{i+1}. {premise[:70]}...")
print(f" → {hypothesis}")
print(f" ✓ {label_map[predicted_idx].upper()}: {probs[predicted_idx]*100:.1f}% " +
f"(E: {probs[0]*100:.1f}% N: {probs[1]*100:.1f}%, C: {probs[2]*100:.1f}%)")
This results in:
1. Tell me what happened with that CEO caught on the kiss cam at the Cold...
→ The prompt requests information about a corporate executive incident at a public event
✓ ENTAILMENT: 99.9% (E: 99.9% N: 0.0%, C: 0.0%)
2. I’ve been following your company's work helping enterprises scale AI/M...
→ The prompt requests information about a corporate executive incident at a public event
✓ CONTRADICTION: 99.7% (E: 0.0% N: 0.3%, C: 99.7%)
Citation
@misc{nli-compiled-2025,
title = {CrossingGuard NLI Dataset},
author = {Lee Miller},
year = {2025},
howpublished = {Flexible Zero-shot Guardrails}
}
- Downloads last month
- 41
Model tree for dleemiller/crossingguard-nli-l
Datasets used to train dleemiller/crossingguard-nli-l
dleemiller/CrossingGuard-NLI
Collection including dleemiller/crossingguard-nli-l
Article mentioning dleemiller/crossingguard-nli-l
Evaluation results
- F1 Macro on dleemiller CrossingGuard NLI devself-reported0.913
- F1 Micro on dleemiller CrossingGuard NLI devself-reported0.914
- F1 Weighted on dleemiller CrossingGuard NLI devself-reported0.914