Token Classification
Transformers
Safetensors
English
haremb_pii
pii
privacy
bioes
Mixture of Experts
haremb
custom_code
Instructions to use fblgit/haremb-privacy-filter-opennemo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fblgit/haremb-privacy-filter-opennemo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="fblgit/haremb-privacy-filter-opennemo", trust_remote_code=True)# Load model directly from transformers import AutoModelForTokenClassification model = AutoModelForTokenClassification.from_pretrained("fblgit/haremb-privacy-filter-opennemo", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| Inference benchmark: A: openmed-base vs B: haremb | |
| device : cuda dtype: torch.bfloat16 | |
| ctx : 1024 | |
| A: openmed-base (reference / teacher) | |
| load : 0.71s | |
| eval : 64.66s on 212,909 tokens (3293 tok/s) | |
| Performance: | |
| total params : 1399.61M (139.35M dense + 1260.26M MoE-experts) | |
| active params / token : 178.73M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 39.38M MoE-active + 11.31M attn/norm/head) | |
| compute params / token : 50.69M (matmul FLOPs only — embedding lookup excluded) | |
| GFLOP / token (fwd, MAC×2): 0.101 | |
| weights size (on disk) : — | |
| weights size (in RAM) : 2.61 GiB | |
| weights resident (GPU) : 2.61 GiB | |
| peak GPU mem (eval, ctx=1024) : 3.30 GiB | |
| B: haremb (this checkpoint) | |
| load : 0.10s | |
| eval : 33.56s on 212,909 tokens (6343 tok/s) | |
| Performance: | |
| total params : 287.11M (129.58M dense + 157.53M MoE-experts) | |
| active params / token : 134.50M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 4.92M MoE-active + 1.54M attn/norm/head) | |
| compute params / token : 6.46M (matmul FLOPs only — embedding lookup excluded) | |
| GFLOP / token (fwd, MAC×2): 0.013 | |
| weights size (on disk) : 547.6 MiB | |
| weights size (in RAM) : 547.6 MiB | |
| weights resident (GPU) : 548.3 MiB | |
| peak GPU mem (eval, ctx=1024) : 1.22 GiB | |
| B vs A (haremb vs openmed-base): | |
| total params : 4.87× smaller | |
| active params / token : 1.33× less | |
| compute params / token : 7.85× cheaper | |
| GFLOP / token : 7.85× cheaper | |
| weights size (on disk) : — | |
| weights in RAM : 4.87× smaller | |
| peak GPU mem (eval) : 2.70× less | |
| throughput : 1.93× faster | |
| Sample inference (load → tokenize → forward → viterbi-decode → spans): | |
| text: 'Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, phone 415-555-0123, email sarah.johnson@example.com, credit card 4111-1111-1111-1111.' | |
| forward latency: 65.8ms (53 tokens) | |
| detected 7 spans: | |
| [ 1, 2) first_name 'Sarah' | |
| [ 2, 3) last_name 'Johnson' | |
| [ 6, 12) date '03/15/1985' | |
| [ 16, 19) phone_number '4872910' | |
| [ 22, 28) phone_number '415-555-0123' | |
| [ 30, 37) email 'sarah.johnson@example.com' | |
| [ 41, 52) credit_debit_card '4111-1111-1111-1111' | |