OpenMed Privacy Filter Multilingual v2 - MLX 8-bit

A native MLX port of OpenMed/privacy-filter-multilingual-v2, affine-quantized to 8-bit for faster and smaller Apple Silicon PII detection with OpenMed. For the unquantized BF16 reference, see OpenMed/privacy-filter-multilingual-v2-mlx.

Family at a glance:

Why 8-bit?

BF16 sibling This repo (Q8)
weights.safetensors size 2.6 GiB 1.4 GiB
Average forward pass 15.1 ms 8.4 ms (~1.8x faster)
Average argmax agreement vs. BF16 reference 100.00%
Entity-span preservation reference identical on all 10 golden samples

Validation used scripts/export/verify_privacy_filter_nemotron_mlx.py over 10 golden PII samples (email, phone, ssn, credit card, name, ipv4, address, date_of_birth, url, mixed).

Quantization

Field Value
Bits 8
Group size 64
Mode affine MLX weight-only quantization
Quantized modules embedding, attention projections, MoE router/expert matrices, output head
Kept in BF16 RMSNorm scales and attention sinks

What it does

This model is an MLX packaging of OpenMed/privacy-filter-multilingual-v2, the second-generation multilingual checkpoint for fine-grained PII extraction across 16 languages. It uses OpenAI's Privacy Filter architecture and predicts 217 BIOES classes (O plus B/I/E/S for each category). The OpenMed PrivacyFilterMLXPipeline runs BIOES-aware Viterbi decoding so callers receive grouped spans instead of raw token tags.

Label coverage highlights:

  • Identity: FIRSTNAME, MIDDLENAME, LASTNAME, AGE, GENDER, USERNAME, OCCUPATION, ORGANIZATION
  • Contact and address: EMAIL, PHONE, URL, STREET, BUILDINGNUMBER, CITY, COUNTY, STATE, ZIPCODE
  • Financial and crypto: BANKACCOUNT, IBAN, BIC, CREDITCARD, CVV, PIN, BITCOINADDRESS, ETHEREUMADDRESS
  • Vehicle, digital, and auth: VIN, VRM, IPADDRESS, MACADDRESS, IMEI, PASSWORD
  • Date and amount labels such as DATE, DATEOFBIRTH, TIME, AMOUNT, CURRENCY, and CURRENCYCODE

The full label map is included in id2label.json.

Architecture

Field Value
Source model type openai_privacy_filter
Source architecture OpenAIPrivacyFilterForTokenClassification
Hidden size 640
Transformer layers 8
Attention Grouped-query attention (14 query heads / 2 KV heads, head_dim=64) with attention sinks
FFN Sparse Mixture-of-Experts - 128 experts, top-4 routing, SwiGLU
Position encoding YARN-scaled RoPE (rope_theta=150000, factor=32)
Context length 131,072 tokens (initial 4,096)
Tokenizer o200k_base / tiktoken-compatible tokenizer assets, vocab 200,064
Output head Linear(640 -> 217) with bias

File set

File Size Purpose
weights.safetensors 1.4 GiB MLX weights
config.json 17.7 KiB Model and OpenMed MLX runtime config
id2label.json 4.8 KiB Numeric ID to BIOES label mapping
openmed-mlx.json 0.8 KiB OpenMed MLX artifact manifest
tokenizer.json 27 MiB Tokenizer asset kept with the artifact
tokenizer_config.json 0.2 KiB Tokenizer metadata

The MLX runtime uses the tiktoken-compatible o200k_base tokenizer path. tokenizer.json and tokenizer_config.json are bundled so consumers can inspect the tokenizer assets and keep the artifact self-contained.

Quick start

With OpenMed

pip install -U "openmed[mlx]"
from openmed import extract_pii, deidentify
from openmed.core import OpenMedConfig

model_name = "OpenMed/privacy-filter-multilingual-v2-mlx-8bit"
text = (
    "Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
    "phone 415-555-0123, email sarah.johnson@example.com."
)

result = extract_pii(
    text,
    model_name=model_name,
    config=OpenMedConfig(backend="mlx"),
)
for ent in result.entities:
    print(ent.label, ent.text, round(ent.confidence, 4))

masked = deidentify(
    text,
    method="mask",
    model_name=model_name,
    config=OpenMedConfig(backend="mlx"),
)
print(masked.deidentified_text)

For non-MLX hosts, use the source PyTorch checkpoint OpenMed/privacy-filter-multilingual-v2.

Direct MLX usage

from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline

model_path = snapshot_download("OpenMed/privacy-filter-multilingual-v2-mlx-8bit")
pipe = PrivacyFilterMLXPipeline(model_path)

print(pipe("Email me at alice.smith@example.com after 5pm."))

Loading from a local snapshot

from openmed.mlx.models import load_model
import mlx.core as mx

model = load_model("/path/to/privacy-filter-multilingual-v2-mlx-8bit")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask)
print(logits.shape)

Hardware notes

  • Designed for Apple Silicon with MLX.
  • CPU inference may work, but GPU-backed MLX on M-series Macs is the intended runtime.
  • The Python package path is pip install -U "openmed[mlx]".

Credits

This artifact builds on:

License

Apache 2.0, matching the source checkpoint metadata.

Downloads last month
-
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for OpenMed/privacy-filter-multilingual-v2-mlx-8bit

Finetuned
(2)
this model

Datasets used to train OpenMed/privacy-filter-multilingual-v2-mlx-8bit