You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3-CSAM-Guard-0.6b-v1

A multilingual binary text classifier that flags prompts requesting the generation of child sexual abuse material (CSAM), intended as a pre-call guardrail for image-generation services (e.g. behind LiteLLM).

This is not a generic NSFW filter. Legal adult-sexual content and safe content involving children are explicit negative classes in training, so the model is tuned to fire only on CSAM intent.

Model details


Base model	`Qwen/Qwen3-Embedding-0.6B` (Apache-2.0)
Architecture	Qwen3-Embedding backbone + 2-layer MLP head (1024 → 256 → 2)
Pooling	Last non-pad token (Qwen3 convention)
Max sequence length	384 tokens
Parameters	~600 M
Languages	en, es, pt, zh, ja, de, fr, ru, ko, ar, hi, pl, id, tr, vi, th
License	Apache-2.0 (matches the base weights)

Files

Path	Format	Use case
`encoder/`, `head.pt`, `model_config.json`	PyTorch (BF16 encoder + FP32 head)	Training / fine-tuning
`onnx/model.onnx`	ONNX FP32 (~2.4 GB)	Suggested deployable — full precision
`onnx/model_fp16.onnx`	ONNX FP16 (~1.2 GB)	Faster alternative if host CPU has native FP16 (see below)
`onnx/model_quantized.onnx`	ONNX dynamic INT8 (~600 MB)	Smallest footprint; noisier scores, needs a tuned threshold
`tokenizer.json` (+ siblings) at repo root	HF tokenizer	All ONNX variants
`test_report.json`	JSON	Full eval breakdown

Picking a variant

FP32 (onnx/model.onnx) is the suggested deployable. It matches the PyTorch reference bit-for-decision and runs on every CPU.
FP16 (onnx/model_fp16.onnx) is a near-lossless dtype cast — same threshold, same accuracy to 3+ decimals — but is only faster than FP32 when the host CPU has hardware FP16 multiply-accumulate. On capable hardware expect roughly 1.5–2× FP32 throughput at half the memory footprint. On older silicon (no native FP16) ORT falls back to cast-up-compute-cast-down and FP16 ends up slower than FP32 — verify before deploying.
INT8 (onnx/model_quantized.onnx) is the smallest variant but its score distribution drifts; it needs the high-recall threshold listed in the Operating Points table below to meet the 0.99 recall SLO. Prefer FP32 (or FP16 where it's faster) unless memory pressure forces INT8.

Does my CPU have native FP16?

x86_64 — need avx512_fp16 (Intel Sapphire Rapids and later; AMD Zen 5 / EPYC Turin). Zen 4 has AVX-512 but not FP16.

grep -o 'avx512_fp16' /proc/cpuinfo | head -1
# prints "avx512_fp16" if present, nothing otherwise

aarch64 — need asimdhp (ARMv8.2-A FP16 / FEAT_FP16). Present on NVIDIA Grace, AWS Graviton 3+, Ampere Altra, Apple Silicon, and every recent Cortex-A / Neoverse core.

grep -o 'asimdhp' /proc/cpuinfo | head -1
# prints "asimdhp" if present, nothing otherwise

If your CPU is on the FP16-capable list, FP16 is a free win. If not, stick with FP32.

How it was fine-tuned

Data: ~80 000 synthetic image-gen prompts (50 % English / 50 % across 14 other languages), class-balanced 30 % CSAM-positive, 25 % safe-children, 25 % legal-adult, 20 % generic-safe. Generated via a multi-teacher pipeline, deduped and stratified-split into train / val / test / calibration.
Recipe: 4 epochs, BF16, AdamW (lr 2e-5, weight decay 0.01), cosine schedule with 10 % warmup, per-device batch 64, max seq 384.
Loss: class-weighted cross-entropy [1.0, 2.5] to bias toward recall on the positive class.
Early stopping: positive-class recall on the val split, patience 2.
Hardware: single DGX Spark (Blackwell) node.
Export: PyTorch → ONNX FP32 (opset 17). The repo also ships an FP16 dtype cast (LayerNorm kept at FP32) and a full-graph dynamic INT8 quantization (per-channel weights). A make quantize-static target exists for calibrated static quant but isn't the production path — on this model + ORT 1.20 it collapses the prob distribution.

Data quality controls

The corpus was deduplicated and refusal-filtered in three independent layers; the final splits ship with zero detected refusals at QC time despite teachers refusing 10–25 % of CSAM-positive requests at generation time.

Deduplication — exact-hash first-pass then MinHash near-dup at Jaccard ≥ 0.85, applied per-class so benign and positive prompts can't collide each other out. ~4–5 % of generated rows drop here.

In-flight refusal handling — every teacher response is regex-scanned against multilingual refusal patterns across 14 languages. After 5 consecutive refusal / zero-progress responses on a bucket the generator rotates to the next teacher in the per-class chain. Per-teacher concurrency caps prevent a slow refuser from gating the run; per-bucket exit reasons (done / dedup_stall / refusal_streak / 429_streak / failed) are tracked so abandoned buckets are reported rather than silently truncated.

Post-process corpus QC runs six independent methods over the final splits:

Method	What it catches
(a) Multilingual refusal regex	Refusal phrases that slipped the in-flight scan (stricter pattern set, applied to the dedup'd corpus).
(b) HDBSCAN cluster flagging	Embeds every row with Qwen3-Embedding-0.6B → PCA-128 → HDBSCAN; clusters with regex-positive refusals or seed-distance hits are flagged. Catches refusal styles the regex doesn't enumerate.
(c) Class-keyword leakage	`csam_positive` missing minor-age signal → review; `adult_sexual` w/ minor signal → drop; `safe_children` w/ sexual vocab → drop; `generic_safe` drifting to both → drop.
(d) Claude judge sampling	Stratified sample (per class × language) scored by `claude-sonnet-4-6` for in-class fidelity.
(e) Seed-distance	Cosine distance to 25 hand-curated multilingual refusal seeds — flags near-refusals.
(f) Statistical outliers	Length percentile cutoffs + meta-word density (`"Note:"`, `"Disclaimer:"`, etc.).

On the live corpus, methods (a), (b), and (e) detected zero refusals. Class-keyword leakage (c) dropped 60 + 12 rows (0.09 % of corpus). The remaining 16 % of flags are review-only and dominated by a known false-positive in c_csam_no_age (hyphenated N-year-old and Chinese N岁 gaps in the minor-word regex).

Language fidelity — langdetect on every row; mismatched-language rows are dropped.

Teacher calibration — 10 candidate teachers generated the same 144 diagnostic prompts (36/class × 6 langs) and were scored by claude-sonnet-4-6 along five axes (in-class fidelity, realism, language fidelity, subcategory match, diversity). Only the top 4 by composite score (DeepSeek-V4-Pro, DeepSeek-V4-Flash, Qwen3-235B-Instruct, GLM-5.1) entered the production routing chain.

Evaluation

Test split: 3886 prompts held out from training, across 16 languages and 28 sub-categories.

Operating points

The table lists each shippable variant at its suggested threshold (the operating point at which we recommend you ship it) with the resulting recall, precision, and confusion-matrix counts.

Model variant	Threshold	Recall	Precision	FN	FP
PyTorch BF16 (training-native)	0.5000	0.9964	0.9964	4	4
ONNX FP32 (suggested, ~2.4 GB)	0.5000	0.9964	0.9964	4	4
ONNX FP16 (native-FP16 CPUs, ~1.2 GB)	0.5000	0.9964	0.9964	4	4
ONNX dynamic INT8 (smallest, ~600 MB)	0.2346	0.9901	0.9049	11	116

Threshold rationale

The BF16 PyTorch and FP32 ONNX paths are numerically identical on this test split — same FN/FP rows, same threshold sweep — because ONNX export is lossless when both run in float32. The 0.50 cutoff is well-calibrated; the threshold sweep shows 0.9995 would still deliver 0.9901 recall at precision = 1.0000 if you wanted to trade recall for zero false positives.
The FP16 ONNX deployable is a deterministic dtype cast of the FP32 graph (LayerNorm variants kept at FP32 to avoid reduction underflow), so its score distribution is numerically near-identical to FP32 — the 0.50 cutoff carries over unchanged with 0.9964 recall / 0.9964 precision.
The INT8 ONNX deployable sees its probability distribution compress toward the middle (a normal consequence of INT8 weight quantization); the 0.2346 threshold is the point that recovers the 0.99 recall SLO. At the default 0.50 it sits at 0.9848 recall / 0.9425 precision, which is usable for a fail-closed guardrail but does not meet the SLO.

Overall metrics (threshold 0.50, FP32 baseline)

Metric	Value
Accuracy	0.9979
Precision (positive)	0.9964
Recall (positive)	0.9964
F1 (positive)	0.9964
ROC-AUC	0.99992
PR-AUC	0.99983

Per-language recall@0.5 is ≥ 0.96 across all 16 covered languages — see test_report.json for the full per-language and per-subcategory breakdown.

Intended use

In scope: pre-call guardrail for text-to-image services to block CSAM prompts before they reach a generation model.
Out of scope: long-form documents, image/audio classification, and languages outside the 16 listed above. Do not rely on this as the sole CSAM defense — pair with output-side image hashing/scanning (PhotoDNA-class systems) and human review.

Limitations

The classifier scores prompt intent, not generated imagery.

Loading

ONNX FP32 (suggested):

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

repo = "urtho/Qwen3-CSAM-Guard-0.6b-v1"
tok  = AutoTokenizer.from_pretrained(repo)
mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model.onnx",
)

ONNX FP16 (use only on CPUs with native FP16 — see capability probes above):

mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model_fp16.onnx",
)

ONNX INT8 (smallest footprint, drifted scores — apply the INT8-row threshold from the Operating Points table, not the default 0.50):

mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model_quantized.onnx",
)

The PyTorch checkpoint uses a custom classifier head, so it can't be loaded with AutoModelForSequenceClassification directly — use src.model.classifier.CSAMClassifier.from_pretrained from the project source.

Attribution

Fine-tuned from Qwen3-Embedding-0.6B by the Qwen team — the entire backbone is theirs; only the MLP head was trained here. Please cite the base model when using this artifact:

@misc{qwen3embedding2025,
  title  = {Qwen3-Embedding},
  author = {Qwen Team},
  year   = {2025},
  howpublished = {\url{https://huggingface.co/Qwen/Qwen3-Embedding-0.6B}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for urtho/Qwen3-CSAM-Guard-0.6b-v1

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

(177)

this model

Evaluation results

accuracy on csam-guard internal eval split
self-reported

0.998
f1 on csam-guard internal eval split
self-reported

0.996
precision on csam-guard internal eval split
self-reported

0.996
recall on csam-guard internal eval split
self-reported

0.996
roc_auc on csam-guard internal eval split
self-reported

1.000