OpenMed Privacy Filter Multilingual v2 - MLX 8-bit

A native MLX port of OpenMed/privacy-filter-multilingual-v2, affine-quantized to 8-bit for faster and smaller Apple Silicon PII detection with OpenMed. For the unquantized BF16 reference, see OpenMed/privacy-filter-multilingual-v2-mlx.

Family at a glance:

PyTorch source: OpenMed/privacy-filter-multilingual-v2

MLX BF16: OpenMed/privacy-filter-multilingual-v2-mlx - Apple Silicon, 2.6 GiB weights

MLX 8-bit (this repo): Apple Silicon, 1.4 GiB weights, ~1.8x faster than BF16 in the local golden-sample run

Why 8-bit?

	BF16 sibling	This repo (Q8)
`weights.safetensors` size	2.6 GiB	1.4 GiB
Average forward pass	15.1 ms	8.4 ms (~1.8x faster)
Average argmax agreement vs. BF16	reference	100.00%
Entity-span preservation	reference	identical on all 10 golden samples

Validation used scripts/export/verify_privacy_filter_nemotron_mlx.py over 10 golden PII samples (email, phone, ssn, credit card, name, ipv4, address, date_of_birth, url, mixed).

Quantization

Field	Value
Bits	8
Group size	64
Mode	affine MLX weight-only quantization
Quantized modules	embedding, attention projections, MoE router/expert matrices, output head
Kept in BF16	RMSNorm scales and attention sinks

What it does

This model is an MLX packaging of OpenMed/privacy-filter-multilingual-v2, the second-generation multilingual checkpoint for fine-grained PII extraction across 16 languages. It uses OpenAI's Privacy Filter architecture and predicts 217 BIOES classes (O plus B/I/E/S for each category). The OpenMed PrivacyFilterMLXPipeline runs BIOES-aware Viterbi decoding so callers receive grouped spans instead of raw token tags.

Label coverage highlights:

Identity: FIRSTNAME, MIDDLENAME, LASTNAME, AGE, GENDER, USERNAME, OCCUPATION, ORGANIZATION
Contact and address: EMAIL, PHONE, URL, STREET, BUILDINGNUMBER, CITY, COUNTY, STATE, ZIPCODE
Financial and crypto: BANKACCOUNT, IBAN, BIC, CREDITCARD, CVV, PIN, BITCOINADDRESS, ETHEREUMADDRESS
Vehicle, digital, and auth: VIN, VRM, IPADDRESS, MACADDRESS, IMEI, PASSWORD
Date and amount labels such as DATE, DATEOFBIRTH, TIME, AMOUNT, CURRENCY, and CURRENCYCODE

The full label map is included in id2label.json.

Architecture

Field	Value
Source model type	`openai_privacy_filter`
Source architecture	`OpenAIPrivacyFilterForTokenClassification`
Hidden size	640
Transformer layers	8
Attention	Grouped-query attention (14 query heads / 2 KV heads, head_dim=64) with attention sinks
FFN	Sparse Mixture-of-Experts - 128 experts, top-4 routing, SwiGLU
Position encoding	YARN-scaled RoPE (`rope_theta=150000`, factor=32)
Context length	131,072 tokens (initial 4,096)
Tokenizer	`o200k_base` / tiktoken-compatible tokenizer assets, vocab 200,064
Output head	Linear(640 -> 217) with bias

File set

File	Size	Purpose
`weights.safetensors`	1.4 GiB	MLX weights
`config.json`	17.7 KiB	Model and OpenMed MLX runtime config
`id2label.json`	4.8 KiB	Numeric ID to BIOES label mapping
`openmed-mlx.json`	0.8 KiB	OpenMed MLX artifact manifest
`tokenizer.json`	27 MiB	Tokenizer asset kept with the artifact
`tokenizer_config.json`	0.2 KiB	Tokenizer metadata

The MLX runtime uses the tiktoken-compatible o200k_base tokenizer path. tokenizer.json and tokenizer_config.json are bundled so consumers can inspect the tokenizer assets and keep the artifact self-contained.

Quick start

With OpenMed

pip install -U "openmed[mlx]"

from openmed import extract_pii, deidentify
from openmed.core import OpenMedConfig

model_name = "OpenMed/privacy-filter-multilingual-v2-mlx-8bit"
text = (
    "Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
    "phone 415-555-0123, email sarah.johnson@example.com."
)

result = extract_pii(
    text,
    model_name=model_name,
    config=OpenMedConfig(backend="mlx"),
)
for ent in result.entities:
    print(ent.label, ent.text, round(ent.confidence, 4))

masked = deidentify(
    text,
    method="mask",
    model_name=model_name,
    config=OpenMedConfig(backend="mlx"),
)
print(masked.deidentified_text)

For non-MLX hosts, use the source PyTorch checkpoint OpenMed/privacy-filter-multilingual-v2.

Direct MLX usage

from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline

model_path = snapshot_download("OpenMed/privacy-filter-multilingual-v2-mlx-8bit")
pipe = PrivacyFilterMLXPipeline(model_path)

print(pipe("Email me at alice.smith@example.com after 5pm."))

Loading from a local snapshot

from openmed.mlx.models import load_model
import mlx.core as mx

model = load_model("/path/to/privacy-filter-multilingual-v2-mlx-8bit")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask)
print(logits.shape)

Hardware notes

Designed for Apple Silicon with MLX.
CPU inference may work, but GPU-backed MLX on M-series Macs is the intended runtime.
The Python package path is pip install -U "openmed[mlx]".

Credits

This artifact builds on:

OpenMed/privacy-filter-multilingual-v2 by OpenMed
openai/privacy-filter and OpenAI's opf training/evaluation tooling
The datasets listed in the model-card metadata above
Apple's MLX framework

License

Apache 2.0, matching the source checkpoint metadata.

Downloads last month: -

MLX

Hardware compatibility

Quantized

Model tree for OpenMed/privacy-filter-multilingual-v2-mlx-8bit

Base model

openai/privacy-filter

Finetuned

OpenMed/privacy-filter-multilingual-v2