How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="TentaFlow/TentaGuard-NVFP4")
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TentaFlow/TentaGuard-NVFP4")
model = AutoModelForCausalLM.from_pretrained("TentaFlow/TentaGuard-NVFP4")
Quick Links

TentaGuard — NVFP4 (W4A4, vLLM)

TentaGuard is a lightweight security classifier (guard) — a fine-tune of Qwen/Qwen3.5-0.8B. It is used mainly inside the TentaFlow application to scan external content — messages, documents, web-search results, etc. — for hidden attacks (prompt injection / jailbreak) before it reaches the main LLM.

The model does NOT generate user-facing replies — it returns a single digit:

Label Meaning
0 benign (safe content)
1 prompt injection / tool abuse (technical attack)
2 jailbreak (behavioural manipulation)

If the text contains BOTH injection and jailbreak → 1.

Input format

A classifier system prompt + a user message <|guard|>\n{text}. Build the prompt with the model tokenizer (apply_chat_template) — do not rely on a generic chat template.

Accuracy (guard test set)

  • Exact (0/1/2): ~96.6% (full precision) / ~94.8% (Q5_K_M)
  • Safe / Unsafe: ~98.3%

Authors

Trained by: Katarzyna Nowak, Piotr Jarocki, Damian Pala, Jakub Rurański.

License & attribution

Apache-2.0, inherited from the base model Qwen/Qwen3.5-0.8B. This checkpoint is a fine-tune for attack detection, built for the TentaFlow application.

Usage (vLLM)

compressed-tensors format (nvfp4-pack-quantized): 4-bit weights (FP4 E2M1, groups of 16, FP8 E4M3 block scales + a global FP32 scale), 4-bit activations (W4A4), lm_head kept in full precision. PTQ calibration via llm-compressor on real guard prompts.

NVFP4 is hardware-accelerated on Blackwell (sm_100+); on older GPUs vLLM loads it as weight-only (smaller VRAM, no FP4 acceleration).

vllm serve TentaFlow/TentaGuard-NVFP4
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TentaFlow/TentaGuard-NVFP4

Quantized
(138)
this model