Instructions to use OpenMed/privacy-filter-multilingual-v2-mlx-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OpenMed/privacy-filter-multilingual-v2-mlx-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir privacy-filter-multilingual-v2-mlx-8bit OpenMed/privacy-filter-multilingual-v2-mlx-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
OpenMed Privacy Filter Multilingual v2 - MLX 8-bit
A native MLX port of OpenMed/privacy-filter-multilingual-v2, affine-quantized to 8-bit for faster and smaller Apple Silicon PII detection with OpenMed. For the unquantized BF16 reference, see OpenMed/privacy-filter-multilingual-v2-mlx.
Family at a glance:
- PyTorch source:
OpenMed/privacy-filter-multilingual-v2- MLX BF16:
OpenMed/privacy-filter-multilingual-v2-mlx- Apple Silicon,2.6 GiBweights- MLX 8-bit (this repo): Apple Silicon,
1.4 GiBweights, ~1.8x faster than BF16 in the local golden-sample run
Why 8-bit?
| BF16 sibling | This repo (Q8) | |
|---|---|---|
weights.safetensors size |
2.6 GiB | 1.4 GiB |
| Average forward pass | 15.1 ms | 8.4 ms (~1.8x faster) |
| Average argmax agreement vs. BF16 | reference | 100.00% |
| Entity-span preservation | reference | identical on all 10 golden samples |
Validation used scripts/export/verify_privacy_filter_nemotron_mlx.py over 10 golden PII samples (email, phone, ssn, credit card, name, ipv4, address, date_of_birth, url, mixed).
Quantization
| Field | Value |
|---|---|
| Bits | 8 |
| Group size | 64 |
| Mode | affine MLX weight-only quantization |
| Quantized modules | embedding, attention projections, MoE router/expert matrices, output head |
| Kept in BF16 | RMSNorm scales and attention sinks |
What it does
This model is an MLX packaging of OpenMed/privacy-filter-multilingual-v2, the second-generation multilingual checkpoint for fine-grained PII extraction across 16 languages. It uses OpenAI's Privacy Filter architecture and predicts 217 BIOES classes (O plus B/I/E/S for each category). The OpenMed PrivacyFilterMLXPipeline runs BIOES-aware Viterbi decoding so callers receive grouped spans instead of raw token tags.
Label coverage highlights:
- Identity: FIRSTNAME, MIDDLENAME, LASTNAME, AGE, GENDER, USERNAME, OCCUPATION, ORGANIZATION
- Contact and address: EMAIL, PHONE, URL, STREET, BUILDINGNUMBER, CITY, COUNTY, STATE, ZIPCODE
- Financial and crypto: BANKACCOUNT, IBAN, BIC, CREDITCARD, CVV, PIN, BITCOINADDRESS, ETHEREUMADDRESS
- Vehicle, digital, and auth: VIN, VRM, IPADDRESS, MACADDRESS, IMEI, PASSWORD
- Date and amount labels such as DATE, DATEOFBIRTH, TIME, AMOUNT, CURRENCY, and CURRENCYCODE
The full label map is included in id2label.json.
Architecture
| Field | Value |
|---|---|
| Source model type | openai_privacy_filter |
| Source architecture | OpenAIPrivacyFilterForTokenClassification |
| Hidden size | 640 |
| Transformer layers | 8 |
| Attention | Grouped-query attention (14 query heads / 2 KV heads, head_dim=64) with attention sinks |
| FFN | Sparse Mixture-of-Experts - 128 experts, top-4 routing, SwiGLU |
| Position encoding | YARN-scaled RoPE (rope_theta=150000, factor=32) |
| Context length | 131,072 tokens (initial 4,096) |
| Tokenizer | o200k_base / tiktoken-compatible tokenizer assets, vocab 200,064 |
| Output head | Linear(640 -> 217) with bias |
File set
| File | Size | Purpose |
|---|---|---|
weights.safetensors |
1.4 GiB | MLX weights |
config.json |
17.7 KiB | Model and OpenMed MLX runtime config |
id2label.json |
4.8 KiB | Numeric ID to BIOES label mapping |
openmed-mlx.json |
0.8 KiB | OpenMed MLX artifact manifest |
tokenizer.json |
27 MiB | Tokenizer asset kept with the artifact |
tokenizer_config.json |
0.2 KiB | Tokenizer metadata |
The MLX runtime uses the tiktoken-compatible o200k_base tokenizer path. tokenizer.json and tokenizer_config.json are bundled so consumers can inspect the tokenizer assets and keep the artifact self-contained.
Quick start
With OpenMed
pip install -U "openmed[mlx]"
from openmed import extract_pii, deidentify
from openmed.core import OpenMedConfig
model_name = "OpenMed/privacy-filter-multilingual-v2-mlx-8bit"
text = (
"Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
"phone 415-555-0123, email sarah.johnson@example.com."
)
result = extract_pii(
text,
model_name=model_name,
config=OpenMedConfig(backend="mlx"),
)
for ent in result.entities:
print(ent.label, ent.text, round(ent.confidence, 4))
masked = deidentify(
text,
method="mask",
model_name=model_name,
config=OpenMedConfig(backend="mlx"),
)
print(masked.deidentified_text)
For non-MLX hosts, use the source PyTorch checkpoint OpenMed/privacy-filter-multilingual-v2.
Direct MLX usage
from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline
model_path = snapshot_download("OpenMed/privacy-filter-multilingual-v2-mlx-8bit")
pipe = PrivacyFilterMLXPipeline(model_path)
print(pipe("Email me at alice.smith@example.com after 5pm."))
Loading from a local snapshot
from openmed.mlx.models import load_model
import mlx.core as mx
model = load_model("/path/to/privacy-filter-multilingual-v2-mlx-8bit")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask)
print(logits.shape)
Hardware notes
- Designed for Apple Silicon with MLX.
- CPU inference may work, but GPU-backed MLX on M-series Macs is the intended runtime.
- The Python package path is
pip install -U "openmed[mlx]".
Credits
This artifact builds on:
OpenMed/privacy-filter-multilingual-v2by OpenMedopenai/privacy-filterand OpenAI'sopftraining/evaluation tooling- The datasets listed in the model-card metadata above
- Apple's MLX framework
License
Apache 2.0, matching the source checkpoint metadata.
- Downloads last month
- -
Quantized
Model tree for OpenMed/privacy-filter-multilingual-v2-mlx-8bit
Base model
openai/privacy-filter