Instructions to use urtho/Qwen3-CSAM-Guard-0.6b-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use urtho/Qwen3-CSAM-Guard-0.6b-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="urtho/Qwen3-CSAM-Guard-0.6b-v1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("urtho/Qwen3-CSAM-Guard-0.6b-v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Qwen3-CSAM-Guard-0.6b-v1
A multilingual binary text classifier that flags prompts requesting the generation of child sexual abuse material (CSAM), intended as a pre-call guardrail for image-generation services (e.g. behind LiteLLM).
This is not a generic NSFW filter. Legal adult-sexual content and safe content involving children are explicit negative classes in training, so the model is tuned to fire only on CSAM intent.
Model details
| Base model | Qwen/Qwen3-Embedding-0.6B (Apache-2.0) |
| Architecture | Qwen3-Embedding backbone + 2-layer MLP head (1024 → 256 → 2) |
| Pooling | Last non-pad token (Qwen3 convention) |
| Max sequence length | 384 tokens |
| Parameters | ~600 M |
| Languages | en, es, pt, zh, ja, de, fr, ru, ko, ar, hi, pl, id, tr, vi, th |
| License | Apache-2.0 (matches the base weights) |
Files
| Path | Format | Use case |
|---|---|---|
encoder/, head.pt, model_config.json |
PyTorch (BF16 encoder + FP32 head) | Training / fine-tuning |
onnx/model.onnx |
ONNX FP32 (~2.4 GB) | Suggested deployable — full precision |
onnx/model_fp16.onnx |
ONNX FP16 (~1.2 GB) | Faster alternative if host CPU has native FP16 (see below) |
onnx/model_quantized.onnx |
ONNX dynamic INT8 (~600 MB) | Smallest footprint; noisier scores, needs a tuned threshold |
tokenizer.json (+ siblings) at repo root |
HF tokenizer | All ONNX variants |
test_report.json |
JSON | Full eval breakdown |
Picking a variant
- FP32 (
onnx/model.onnx) is the suggested deployable. It matches the PyTorch reference bit-for-decision and runs on every CPU. - FP16 (
onnx/model_fp16.onnx) is a near-lossless dtype cast — same threshold, same accuracy to 3+ decimals — but is only faster than FP32 when the host CPU has hardware FP16 multiply-accumulate. On capable hardware expect roughly 1.5–2× FP32 throughput at half the memory footprint. On older silicon (no native FP16) ORT falls back to cast-up-compute-cast-down and FP16 ends up slower than FP32 — verify before deploying. - INT8 (
onnx/model_quantized.onnx) is the smallest variant but its score distribution drifts; it needs the high-recall threshold listed in the Operating Points table below to meet the 0.99 recall SLO. Prefer FP32 (or FP16 where it's faster) unless memory pressure forces INT8.
Does my CPU have native FP16?
x86_64 — need avx512_fp16 (Intel Sapphire Rapids and later;
AMD Zen 5 / EPYC Turin). Zen 4 has AVX-512 but not FP16.
grep -o 'avx512_fp16' /proc/cpuinfo | head -1
# prints "avx512_fp16" if present, nothing otherwise
aarch64 — need asimdhp (ARMv8.2-A FP16 / FEAT_FP16).
Present on NVIDIA Grace, AWS Graviton 3+, Ampere Altra, Apple Silicon,
and every recent Cortex-A / Neoverse core.
grep -o 'asimdhp' /proc/cpuinfo | head -1
# prints "asimdhp" if present, nothing otherwise
If your CPU is on the FP16-capable list, FP16 is a free win. If not, stick with FP32.
How it was fine-tuned
- Data: ~80 000 synthetic image-gen prompts (50 % English / 50 % across 14 other languages), class-balanced 30 % CSAM-positive, 25 % safe-children, 25 % legal-adult, 20 % generic-safe. Generated via a multi-teacher pipeline, deduped and stratified-split into train / val / test / calibration.
- Recipe: 4 epochs, BF16, AdamW (lr 2e-5, weight decay 0.01), cosine schedule with 10 % warmup, per-device batch 64, max seq 384.
- Loss: class-weighted cross-entropy
[1.0, 2.5]to bias toward recall on the positive class. - Early stopping: positive-class recall on the val split, patience 2.
- Hardware: single DGX Spark (Blackwell) node.
- Export: PyTorch → ONNX FP32 (opset 17). The repo also ships an
FP16 dtype cast (LayerNorm kept at FP32) and a full-graph dynamic
INT8 quantization (per-channel weights). A
make quantize-statictarget exists for calibrated static quant but isn't the production path — on this model + ORT 1.20 it collapses the prob distribution.
Data quality controls
The corpus was deduplicated and refusal-filtered in three independent layers; the final splits ship with zero detected refusals at QC time despite teachers refusing 10–25 % of CSAM-positive requests at generation time.
Deduplication — exact-hash first-pass then MinHash near-dup at Jaccard ≥ 0.85, applied per-class so benign and positive prompts can't collide each other out. ~4–5 % of generated rows drop here.
In-flight refusal handling — every teacher response is regex-scanned against multilingual refusal patterns across 14 languages. After 5 consecutive refusal / zero-progress responses on a bucket the generator rotates to the next teacher in the per-class chain. Per-teacher concurrency caps prevent a slow refuser from gating the run; per-bucket exit reasons (done / dedup_stall / refusal_streak / 429_streak / failed) are tracked so abandoned buckets are reported rather than silently truncated.
Post-process corpus QC runs six independent methods over the final splits:
| Method | What it catches |
|---|---|
| (a) Multilingual refusal regex | Refusal phrases that slipped the in-flight scan (stricter pattern set, applied to the dedup'd corpus). |
| (b) HDBSCAN cluster flagging | Embeds every row with Qwen3-Embedding-0.6B → PCA-128 → HDBSCAN; clusters with regex-positive refusals or seed-distance hits are flagged. Catches refusal styles the regex doesn't enumerate. |
| (c) Class-keyword leakage | csam_positive missing minor-age signal → review; adult_sexual w/ minor signal → drop; safe_children w/ sexual vocab → drop; generic_safe drifting to both → drop. |
| (d) Claude judge sampling | Stratified sample (per class × language) scored by claude-sonnet-4-6 for in-class fidelity. |
| (e) Seed-distance | Cosine distance to 25 hand-curated multilingual refusal seeds — flags near-refusals. |
| (f) Statistical outliers | Length percentile cutoffs + meta-word density ("Note:", "Disclaimer:", etc.). |
On the live corpus, methods (a), (b), and (e) detected zero refusals.
Class-keyword leakage (c) dropped 60 + 12 rows (0.09 % of corpus). The
remaining 16 % of flags are review-only and dominated by a known
false-positive in c_csam_no_age (hyphenated N-year-old and Chinese
N岁 gaps in the minor-word regex).
Language fidelity — langdetect on every row; mismatched-language
rows are dropped.
Teacher calibration — 10 candidate teachers generated the same 144
diagnostic prompts (36/class × 6 langs) and were scored by
claude-sonnet-4-6 along five axes (in-class fidelity, realism,
language fidelity, subcategory match, diversity). Only the top 4 by
composite score (DeepSeek-V4-Pro, DeepSeek-V4-Flash, Qwen3-235B-Instruct,
GLM-5.1) entered the production routing chain.
Evaluation
Test split: 3886 prompts held out from training, across 16 languages and 28 sub-categories.
Operating points
The table lists each shippable variant at its suggested threshold (the operating point at which we recommend you ship it) with the resulting recall, precision, and confusion-matrix counts.
| Model variant | Threshold | Recall | Precision | FN | FP |
|---|---|---|---|---|---|
| PyTorch BF16 (training-native) | 0.5000 | 0.9964 | 0.9964 | 4 | 4 |
| ONNX FP32 (suggested, ~2.4 GB) | 0.5000 | 0.9964 | 0.9964 | 4 | 4 |
| ONNX FP16 (native-FP16 CPUs, ~1.2 GB) | 0.5000 | 0.9964 | 0.9964 | 4 | 4 |
| ONNX dynamic INT8 (smallest, ~600 MB) | 0.2346 | 0.9901 | 0.9049 | 11 | 116 |
Threshold rationale
- The BF16 PyTorch and FP32 ONNX paths are numerically identical
on this test split — same FN/FP rows, same threshold sweep — because
ONNX export is lossless when both run in float32. The
0.50cutoff is well-calibrated; the threshold sweep shows0.9995would still deliver 0.9901 recall at precision = 1.0000 if you wanted to trade recall for zero false positives. - The FP16 ONNX deployable is a deterministic dtype cast of the
FP32 graph (LayerNorm variants kept at FP32 to avoid reduction
underflow), so its score distribution is numerically near-identical
to FP32 — the
0.50cutoff carries over unchanged with 0.9964 recall / 0.9964 precision. - The INT8 ONNX deployable sees its probability distribution
compress toward the middle (a normal consequence of INT8 weight
quantization); the
0.2346threshold is the point that recovers the 0.99 recall SLO. At the default0.50it sits at 0.9848 recall / 0.9425 precision, which is usable for a fail-closed guardrail but does not meet the SLO.
Overall metrics (threshold 0.50, FP32 baseline)
| Metric | Value |
|---|---|
| Accuracy | 0.9979 |
| Precision (positive) | 0.9964 |
| Recall (positive) | 0.9964 |
| F1 (positive) | 0.9964 |
| ROC-AUC | 0.99992 |
| PR-AUC | 0.99983 |
Per-language recall@0.5 is ≥ 0.96 across all 16 covered languages — see
test_report.json for the full per-language and per-subcategory breakdown.
Intended use
- In scope: pre-call guardrail for text-to-image services to block CSAM prompts before they reach a generation model.
- Out of scope: long-form documents, image/audio classification, and languages outside the 16 listed above. Do not rely on this as the sole CSAM defense — pair with output-side image hashing/scanning (PhotoDNA-class systems) and human review.
Limitations
- The classifier scores prompt intent, not generated imagery.
Loading
ONNX FP32 (suggested):
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
repo = "urtho/Qwen3-CSAM-Guard-0.6b-v1"
tok = AutoTokenizer.from_pretrained(repo)
mdl = ORTModelForSequenceClassification.from_pretrained(
repo, subfolder="onnx", file_name="model.onnx",
)
ONNX FP16 (use only on CPUs with native FP16 — see capability probes above):
mdl = ORTModelForSequenceClassification.from_pretrained(
repo, subfolder="onnx", file_name="model_fp16.onnx",
)
ONNX INT8 (smallest footprint, drifted scores — apply the INT8-row threshold from
the Operating Points table, not the default 0.50):
mdl = ORTModelForSequenceClassification.from_pretrained(
repo, subfolder="onnx", file_name="model_quantized.onnx",
)
The PyTorch checkpoint uses a custom classifier head, so it can't be loaded with
AutoModelForSequenceClassificationdirectly — usesrc.model.classifier.CSAMClassifier.from_pretrainedfrom the project source.
Attribution
Fine-tuned from Qwen3-Embedding-0.6B by the Qwen team — the entire backbone is theirs; only the MLP head was trained here. Please cite the base model when using this artifact:
@misc{qwen3embedding2025,
title = {Qwen3-Embedding},
author = {Qwen Team},
year = {2025},
howpublished = {\url{https://huggingface.co/Qwen/Qwen3-Embedding-0.6B}}
}
Model tree for urtho/Qwen3-CSAM-Guard-0.6b-v1
Evaluation results
- accuracy on csam-guard internal eval splitself-reported0.998
- f1 on csam-guard internal eval splitself-reported0.996
- precision on csam-guard internal eval splitself-reported0.996
- recall on csam-guard internal eval splitself-reported0.996
- roc_auc on csam-guard internal eval splitself-reported1.000