---
license: apache-2.0
base_model:
- Qwen/Qwen3.5-0.8B
pipeline_tag: text-classification
library_name: mlx
language:
- en
- pl
tags:
- mlx
- apple
- quantized
- 4-bit
- tentaguard
- guard
- security
- prompt-injection
- tentaflow
---

# TentaGuard — MLX 4-bit (Apple Silicon)

**TentaGuard** is a lightweight security classifier (guard) — a fine-tune of
[`Qwen/Qwen3.5-0.8B`](https://huggingface.co/Qwen/Qwen3.5-0.8B). It is used **mainly inside the
[TentaFlow](https://github.com/Slyb00ts/TentaFlow) application** to scan external content — messages, documents,
web-search results, etc. — for **hidden attacks** (prompt injection / jailbreak) before it
reaches the main LLM.

The model does NOT generate user-facing replies — it returns a single digit:

| Label | Meaning |
|-------|---------|
| `0` | benign (safe content) |
| `1` | prompt injection / tool abuse (technical attack) |
| `2` | jailbreak (behavioural manipulation) |

If the text contains BOTH injection and jailbreak → `1`.

## Input format

A classifier system prompt + a user message `<|guard|>\n{text}`. **Build the prompt with the
model tokenizer (`apply_chat_template`)** — do not rely on a generic chat template.

## Accuracy (guard test set)

- Exact (0/1/2): **~96.6%** (full precision) / **~94.8%** (Q5_K_M)
- Safe / Unsafe: **~98.3%**

## Authors

Trained by: **Katarzyna Nowak**, **Piotr Jarocki**, **Damian Pala**, **Jakub Rurański**.

## License & attribution

Apache-2.0, inherited from the base model [`Qwen/Qwen3.5-0.8B`](https://huggingface.co/Qwen/Qwen3.5-0.8B).
This checkpoint is a fine-tune for attack detection, built for the [TentaFlow](https://github.com/Slyb00ts/TentaFlow) application.

## Usage (MLX — Apple Silicon)

4-bit quantization (affine, group_size=64) for `mlx-lm` / mlx-swift.

```python
from mlx_lm import load, generate
model, tok = load("TentaFlow/TentaGuard-MLX-4bit")
prompt = tok.apply_chat_template(
    [{"role":"system","content":"You are a security classifier. Output ONLY 0/1/2."},
     {"role":"user","content":"<|guard|>\n" + text}],
    add_generation_prompt=True)
print(generate(model, tok, prompt=prompt, max_tokens=5))
```