TentaFlow
/

TentaGuard-MLX-4bit

Text Classification

4-bit precision

prompt-injection

Model card Files Files and versions

TentaGuard-MLX-4bit / README.md

TentaFlow's picture

Update README.md

f70a1b9 verified 1 day ago

|

history blame contribute delete

2.16 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen3.5-0.8B
	pipeline_tag: text-classification
	library_name: mlx
	language:
	- en
	- pl
	tags:
	- mlx
	- apple
	- quantized
	- 4-bit
	- tentaguard
	- guard
	- security
	- prompt-injection
	- tentaflow
	---

	# TentaGuard — MLX 4-bit (Apple Silicon)

	TentaGuard is a lightweight security classifier (guard) — a fine-tune of
	[`Qwen/Qwen3.5-0.8B`](https://huggingface.co/Qwen/Qwen3.5-0.8B). It is used **mainly inside the
	[TentaFlow](https://github.com/Slyb00ts/TentaFlow) application** to scan external content — messages, documents,
	web-search results, etc. — for hidden attacks (prompt injection / jailbreak) before it
	reaches the main LLM.

	The model does NOT generate user-facing replies — it returns a single digit:

	\| Label \| Meaning \|
	\|-------\|---------\|
	\| `0` \| benign (safe content) \|
	\| `1` \| prompt injection / tool abuse (technical attack) \|
	\| `2` \| jailbreak (behavioural manipulation) \|

	If the text contains BOTH injection and jailbreak → `1`.

	## Input format

	A classifier system prompt + a user message `<\|guard\|>\n{text}`. **Build the prompt with the
	model tokenizer (`apply_chat_template`)** — do not rely on a generic chat template.

	## Accuracy (guard test set)

	- Exact (0/1/2): ~96.6% (full precision) / ~94.8% (Q5_K_M)
	- Safe / Unsafe: ~98.3%

	## Authors

	Trained by: Katarzyna Nowak, Piotr Jarocki, Damian Pala, Jakub Rurański.

	## License & attribution

	Apache-2.0, inherited from the base model [`Qwen/Qwen3.5-0.8B`](https://huggingface.co/Qwen/Qwen3.5-0.8B).
	This checkpoint is a fine-tune for attack detection, built for the [TentaFlow](https://github.com/Slyb00ts/TentaFlow) application.

	## Usage (MLX — Apple Silicon)

	4-bit quantization (affine, group_size=64) for `mlx-lm` / mlx-swift.

	```python
	from mlx_lm import load, generate
	model, tok = load("TentaFlow/TentaGuard-MLX-4bit")
	prompt = tok.apply_chat_template(
	[{"role":"system","content":"You are a security classifier. Output ONLY 0/1/2."},
	{"role":"user","content":"<\|guard\|>\n" + text}],
	add_generation_prompt=True)
	print(generate(model, tok, prompt=prompt, max_tokens=5))
	```