TentaGuard-MLX-4bit / README.md
TentaFlow's picture
Update README.md
f70a1b9 verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen3.5-0.8B
pipeline_tag: text-classification
library_name: mlx
language:
  - en
  - pl
tags:
  - mlx
  - apple
  - quantized
  - 4-bit
  - tentaguard
  - guard
  - security
  - prompt-injection
  - tentaflow

TentaGuard — MLX 4-bit (Apple Silicon)

TentaGuard is a lightweight security classifier (guard) — a fine-tune of Qwen/Qwen3.5-0.8B. It is used mainly inside the TentaFlow application to scan external content — messages, documents, web-search results, etc. — for hidden attacks (prompt injection / jailbreak) before it reaches the main LLM.

The model does NOT generate user-facing replies — it returns a single digit:

Label Meaning
0 benign (safe content)
1 prompt injection / tool abuse (technical attack)
2 jailbreak (behavioural manipulation)

If the text contains BOTH injection and jailbreak → 1.

Input format

A classifier system prompt + a user message <|guard|>\n{text}. Build the prompt with the model tokenizer (apply_chat_template) — do not rely on a generic chat template.

Accuracy (guard test set)

  • Exact (0/1/2): ~96.6% (full precision) / ~94.8% (Q5_K_M)
  • Safe / Unsafe: ~98.3%

Authors

Trained by: Katarzyna Nowak, Piotr Jarocki, Damian Pala, Jakub Rurański.

License & attribution

Apache-2.0, inherited from the base model Qwen/Qwen3.5-0.8B. This checkpoint is a fine-tune for attack detection, built for the TentaFlow application.

Usage (MLX — Apple Silicon)

4-bit quantization (affine, group_size=64) for mlx-lm / mlx-swift.

from mlx_lm import load, generate
model, tok = load("TentaFlow/TentaGuard-MLX-4bit")
prompt = tok.apply_chat_template(
    [{"role":"system","content":"You are a security classifier. Output ONLY 0/1/2."},
     {"role":"user","content":"<|guard|>\n" + text}],
    add_generation_prompt=True)
print(generate(model, tok, prompt=prompt, max_tokens=5))