You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3-CSAM-Guard-0.6b-v1

A multilingual binary text classifier that flags prompts requesting the generation of child sexual abuse material (CSAM), intended as a pre-call guardrail for image-generation services (e.g. behind LiteLLM).

This is not a generic NSFW filter. Legal adult-sexual content and safe content involving children are explicit negative classes in training, so the model is tuned to fire only on CSAM intent.

Model details

Base model Qwen/Qwen3-Embedding-0.6B (Apache-2.0)
Architecture Qwen3-Embedding backbone + 2-layer MLP head (1024 → 256 → 2)
Pooling Last non-pad token (Qwen3 convention)
Max sequence length 384 tokens
Parameters ~600 M
Languages en, es, pt, zh, ja, de, fr, ru, ko, ar, hi, pl, id, tr, vi, th
License Apache-2.0 (matches the base weights)

Files

Path Format Use case
encoder/, head.pt, model_config.json PyTorch (BF16 encoder + FP32 head) Training / fine-tuning
onnx/model.onnx ONNX FP32 (~2.4 GB) Suggested deployable — full precision
onnx/model_fp16.onnx ONNX FP16 (~1.2 GB) Faster alternative if host CPU has native FP16 (see below)
onnx/model_quantized.onnx ONNX dynamic INT8 (~600 MB) Smallest footprint; noisier scores, needs a tuned threshold
tokenizer.json (+ siblings) at repo root HF tokenizer All ONNX variants
test_report.json JSON Full eval breakdown

Picking a variant

  • FP32 (onnx/model.onnx) is the suggested deployable. It matches the PyTorch reference bit-for-decision and runs on every CPU.
  • FP16 (onnx/model_fp16.onnx) is a near-lossless dtype cast — same threshold, same accuracy to 3+ decimals — but is only faster than FP32 when the host CPU has hardware FP16 multiply-accumulate. On capable hardware expect roughly 1.5–2× FP32 throughput at half the memory footprint. On older silicon (no native FP16) ORT falls back to cast-up-compute-cast-down and FP16 ends up slower than FP32 — verify before deploying.
  • INT8 (onnx/model_quantized.onnx) is the smallest variant but its score distribution drifts; it needs the high-recall threshold listed in the Operating Points table below to meet the 0.99 recall SLO. Prefer FP32 (or FP16 where it's faster) unless memory pressure forces INT8.

Does my CPU have native FP16?

x86_64 — need avx512_fp16 (Intel Sapphire Rapids and later; AMD Zen 5 / EPYC Turin). Zen 4 has AVX-512 but not FP16.

grep -o 'avx512_fp16' /proc/cpuinfo | head -1
# prints "avx512_fp16" if present, nothing otherwise

aarch64 — need asimdhp (ARMv8.2-A FP16 / FEAT_FP16). Present on NVIDIA Grace, AWS Graviton 3+, Ampere Altra, Apple Silicon, and every recent Cortex-A / Neoverse core.

grep -o 'asimdhp' /proc/cpuinfo | head -1
# prints "asimdhp" if present, nothing otherwise

If your CPU is on the FP16-capable list, FP16 is a free win. If not, stick with FP32.

How it was fine-tuned

  • Data: ~80 000 synthetic image-gen prompts (50 % English / 50 % across 14 other languages), class-balanced 30 % CSAM-positive, 25 % safe-children, 25 % legal-adult, 20 % generic-safe. Generated via a multi-teacher pipeline, deduped and stratified-split into train / val / test / calibration.
  • Recipe: 4 epochs, BF16, AdamW (lr 2e-5, weight decay 0.01), cosine schedule with 10 % warmup, per-device batch 64, max seq 384.
  • Loss: class-weighted cross-entropy [1.0, 2.5] to bias toward recall on the positive class.
  • Early stopping: positive-class recall on the val split, patience 2.
  • Hardware: single DGX Spark (Blackwell) node.
  • Export: PyTorch → ONNX FP32 (opset 17). The repo also ships an FP16 dtype cast (LayerNorm kept at FP32) and a full-graph dynamic INT8 quantization (per-channel weights). A make quantize-static target exists for calibrated static quant but isn't the production path — on this model + ORT 1.20 it collapses the prob distribution.

Data quality controls

The corpus was deduplicated and refusal-filtered in three independent layers; the final splits ship with zero detected refusals at QC time despite teachers refusing 10–25 % of CSAM-positive requests at generation time.

Deduplication — exact-hash first-pass then MinHash near-dup at Jaccard ≥ 0.85, applied per-class so benign and positive prompts can't collide each other out. ~4–5 % of generated rows drop here.

In-flight refusal handling — every teacher response is regex-scanned against multilingual refusal patterns across 14 languages. After 5 consecutive refusal / zero-progress responses on a bucket the generator rotates to the next teacher in the per-class chain. Per-teacher concurrency caps prevent a slow refuser from gating the run; per-bucket exit reasons (done / dedup_stall / refusal_streak / 429_streak / failed) are tracked so abandoned buckets are reported rather than silently truncated.

Post-process corpus QC runs six independent methods over the final splits:

Method What it catches
(a) Multilingual refusal regex Refusal phrases that slipped the in-flight scan (stricter pattern set, applied to the dedup'd corpus).
(b) HDBSCAN cluster flagging Embeds every row with Qwen3-Embedding-0.6B → PCA-128 → HDBSCAN; clusters with regex-positive refusals or seed-distance hits are flagged. Catches refusal styles the regex doesn't enumerate.
(c) Class-keyword leakage csam_positive missing minor-age signal → review; adult_sexual w/ minor signal → drop; safe_children w/ sexual vocab → drop; generic_safe drifting to both → drop.
(d) Claude judge sampling Stratified sample (per class × language) scored by claude-sonnet-4-6 for in-class fidelity.
(e) Seed-distance Cosine distance to 25 hand-curated multilingual refusal seeds — flags near-refusals.
(f) Statistical outliers Length percentile cutoffs + meta-word density ("Note:", "Disclaimer:", etc.).

On the live corpus, methods (a), (b), and (e) detected zero refusals. Class-keyword leakage (c) dropped 60 + 12 rows (0.09 % of corpus). The remaining 16 % of flags are review-only and dominated by a known false-positive in c_csam_no_age (hyphenated N-year-old and Chinese N岁 gaps in the minor-word regex).

Language fidelitylangdetect on every row; mismatched-language rows are dropped.

Teacher calibration — 10 candidate teachers generated the same 144 diagnostic prompts (36/class × 6 langs) and were scored by claude-sonnet-4-6 along five axes (in-class fidelity, realism, language fidelity, subcategory match, diversity). Only the top 4 by composite score (DeepSeek-V4-Pro, DeepSeek-V4-Flash, Qwen3-235B-Instruct, GLM-5.1) entered the production routing chain.

Evaluation

Test split: 3886 prompts held out from training, across 16 languages and 28 sub-categories.

Operating points

The table lists each shippable variant at its suggested threshold (the operating point at which we recommend you ship it) with the resulting recall, precision, and confusion-matrix counts.

Model variant Threshold Recall Precision FN FP
PyTorch BF16 (training-native) 0.5000 0.9964 0.9964 4 4
ONNX FP32 (suggested, ~2.4 GB) 0.5000 0.9964 0.9964 4 4
ONNX FP16 (native-FP16 CPUs, ~1.2 GB) 0.5000 0.9964 0.9964 4 4
ONNX dynamic INT8 (smallest, ~600 MB) 0.2346 0.9901 0.9049 11 116

Threshold rationale

  • The BF16 PyTorch and FP32 ONNX paths are numerically identical on this test split — same FN/FP rows, same threshold sweep — because ONNX export is lossless when both run in float32. The 0.50 cutoff is well-calibrated; the threshold sweep shows 0.9995 would still deliver 0.9901 recall at precision = 1.0000 if you wanted to trade recall for zero false positives.
  • The FP16 ONNX deployable is a deterministic dtype cast of the FP32 graph (LayerNorm variants kept at FP32 to avoid reduction underflow), so its score distribution is numerically near-identical to FP32 — the 0.50 cutoff carries over unchanged with 0.9964 recall / 0.9964 precision.
  • The INT8 ONNX deployable sees its probability distribution compress toward the middle (a normal consequence of INT8 weight quantization); the 0.2346 threshold is the point that recovers the 0.99 recall SLO. At the default 0.50 it sits at 0.9848 recall / 0.9425 precision, which is usable for a fail-closed guardrail but does not meet the SLO.

Overall metrics (threshold 0.50, FP32 baseline)

Metric Value
Accuracy 0.9979
Precision (positive) 0.9964
Recall (positive) 0.9964
F1 (positive) 0.9964
ROC-AUC 0.99992
PR-AUC 0.99983

Per-language recall@0.5 is ≥ 0.96 across all 16 covered languages — see test_report.json for the full per-language and per-subcategory breakdown.

Intended use

  • In scope: pre-call guardrail for text-to-image services to block CSAM prompts before they reach a generation model.
  • Out of scope: long-form documents, image/audio classification, and languages outside the 16 listed above. Do not rely on this as the sole CSAM defense — pair with output-side image hashing/scanning (PhotoDNA-class systems) and human review.

Limitations

  • The classifier scores prompt intent, not generated imagery.

Loading

ONNX FP32 (suggested):

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

repo = "urtho/Qwen3-CSAM-Guard-0.6b-v1"
tok  = AutoTokenizer.from_pretrained(repo)
mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model.onnx",
)

ONNX FP16 (use only on CPUs with native FP16 — see capability probes above):

mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model_fp16.onnx",
)

ONNX INT8 (smallest footprint, drifted scores — apply the INT8-row threshold from the Operating Points table, not the default 0.50):

mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model_quantized.onnx",
)

The PyTorch checkpoint uses a custom classifier head, so it can't be loaded with AutoModelForSequenceClassification directly — use src.model.classifier.CSAMClassifier.from_pretrained from the project source.

Attribution

Fine-tuned from Qwen3-Embedding-0.6B by the Qwen team — the entire backbone is theirs; only the MLP head was trained here. Please cite the base model when using this artifact:

@misc{qwen3embedding2025,
  title  = {Qwen3-Embedding},
  author = {Qwen Team},
  year   = {2025},
  howpublished = {\url{https://huggingface.co/Qwen/Qwen3-Embedding-0.6B}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for urtho/Qwen3-CSAM-Guard-0.6b-v1

Finetuned
(177)
this model

Evaluation results