Instructions to use TentaFlow/TentaGuard-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TentaFlow/TentaGuard-NVFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="TentaFlow/TentaGuard-NVFP4")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TentaFlow/TentaGuard-NVFP4") model = AutoModelForCausalLM.from_pretrained("TentaFlow/TentaGuard-NVFP4") - Notebooks
- Google Colab
- Kaggle
TentaGuard — NVFP4 (W4A4, vLLM)
TentaGuard is a lightweight security classifier (guard) — a fine-tune of
Qwen/Qwen3.5-0.8B. It is used mainly inside the
TentaFlow application to scan external content — messages, documents,
web-search results, etc. — for hidden attacks (prompt injection / jailbreak) before it
reaches the main LLM.
The model does NOT generate user-facing replies — it returns a single digit:
| Label | Meaning |
|---|---|
0 |
benign (safe content) |
1 |
prompt injection / tool abuse (technical attack) |
2 |
jailbreak (behavioural manipulation) |
If the text contains BOTH injection and jailbreak → 1.
Input format
A classifier system prompt + a user message <|guard|>\n{text}. Build the prompt with the
model tokenizer (apply_chat_template) — do not rely on a generic chat template.
Accuracy (guard test set)
- Exact (0/1/2): ~96.6% (full precision) / ~94.8% (Q5_K_M)
- Safe / Unsafe: ~98.3%
Authors
Trained by: Katarzyna Nowak, Piotr Jarocki, Damian Pala, Jakub Rurański.
License & attribution
Apache-2.0, inherited from the base model Qwen/Qwen3.5-0.8B.
This checkpoint is a fine-tune for attack detection, built for the TentaFlow application.
Usage (vLLM)
compressed-tensors format (nvfp4-pack-quantized): 4-bit weights (FP4 E2M1, groups of 16,
FP8 E4M3 block scales + a global FP32 scale), 4-bit activations (W4A4), lm_head kept in full
precision. PTQ calibration via llm-compressor
on real guard prompts.
NVFP4 is hardware-accelerated on Blackwell (sm_100+); on older GPUs vLLM loads it as weight-only (smaller VRAM, no FP4 acceleration).
vllm serve TentaFlow/TentaGuard-NVFP4
- Downloads last month
- -