--- license: apache-2.0 base_model: - Qwen/Qwen3.5-0.8B pipeline_tag: text-classification library_name: transformers language: - en - pl tags: - nvfp4 - fp4 - compressed-tensors - vllm - quantized - tentaguard - guard - security - prompt-injection - tentaflow --- # TentaGuard — NVFP4 (W4A4, vLLM) **TentaGuard** is a lightweight security classifier (guard) — a fine-tune of [`Qwen/Qwen3.5-0.8B`](https://huggingface.co/Qwen/Qwen3.5-0.8B). It is used **mainly inside the [TentaFlow](https://github.com/Slyb00ts/TentaFlow) application** to scan external content — messages, documents, web-search results, etc. — for **hidden attacks** (prompt injection / jailbreak) before it reaches the main LLM. The model does NOT generate user-facing replies — it returns a single digit: | Label | Meaning | |-------|---------| | `0` | benign (safe content) | | `1` | prompt injection / tool abuse (technical attack) | | `2` | jailbreak (behavioural manipulation) | If the text contains BOTH injection and jailbreak → `1`. ## Input format A classifier system prompt + a user message `<|guard|>\n{text}`. **Build the prompt with the model tokenizer (`apply_chat_template`)** — do not rely on a generic chat template. ## Accuracy (guard test set) - Exact (0/1/2): **~96.6%** (full precision) / **~94.8%** (Q5_K_M) - Safe / Unsafe: **~98.3%** ## Authors Trained by: **Katarzyna Nowak**, **Piotr Jarocki**, **Damian Pala**, **Jakub Rurański**. ## License & attribution Apache-2.0, inherited from the base model [`Qwen/Qwen3.5-0.8B`](https://huggingface.co/Qwen/Qwen3.5-0.8B). This checkpoint is a fine-tune for attack detection, built for the [TentaFlow](https://github.com/Slyb00ts/TentaFlow) application. ## Usage (vLLM) `compressed-tensors` format (`nvfp4-pack-quantized`): 4-bit weights (FP4 E2M1, groups of 16, FP8 E4M3 block scales + a global FP32 scale), 4-bit activations (W4A4), `lm_head` kept in full precision. PTQ calibration via [`llm-compressor`](https://github.com/vllm-project/llm-compressor) on real guard prompts. NVFP4 is hardware-accelerated on **Blackwell (sm_100+)**; on older GPUs vLLM loads it as **weight-only** (smaller VRAM, no FP4 acceleration). ```bash vllm serve TentaFlow/TentaGuard-NVFP4 ```