Person Name Match Likelihood (v6)
Author: Elad Laor Β· LinkedIn
A scoring head over
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anlitrained to predict whether two strings refer to the same person. Useful for record linkage, deduplication, and KYC-style identity matching where the only signal is a pair of name strings.
Quick start
from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
BASE = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
ADAPTER = "LessLM/person-name-match-likelihood-v6"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForSequenceClassification.from_pretrained(BASE, num_labels=2)
model = PeftModel.from_pretrained(base, ADAPTER).eval()
def score(name_a: str, name_b: str) -> float:
"""Return P(same person) in [0, 1]."""
inputs = tokenizer(name_a, name_b, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
return torch.softmax(logits, dim=-1)[0, 1].item()
print(score("John A. Smith", "J. Smith")) # ~0.95 (initial expansion)
print(score("Yitzhak Cohen", "Itzhak Cohen")) # ~0.99 (transliteration)
print(score("John Smith", "John Smyth")) # ~0.90 (typo)
print(score("Robert Adams", "Roberta Adams")) # ~0.05 (similar but different)
The model returns a 2-way softmax over [no_match, match]. The match probability is interpretable as a likelihood score; a temperature scaler (calibration.pt in this repo) is fitted on a held-out set if you want calibrated probabilities β load it and apply before the softmax for slightly tighter Expected Calibration Error.
Headline metrics
Evaluated on a held-out test set of 2,510 name pairs drawn from real public entity data (OpenSanctions) and a curated synthetic edge-case set.
| Metric | Score |
|---|---|
| F1 | 0.9682 |
| Precision | 0.9568 |
| Recall | 0.9798 |
| Accuracy | 0.9733 |
| Expected Calibration Error | 0.0162 |
| Latency (p95, CPU) | 0.42 ms |
Performance by edge case
| Edge case | Accuracy | n |
|---|---|---|
nickname (Bob β Robert) |
100.0% | 121 |
name_order (Last, First β First Last) |
100.0% | 112 |
transliteration (Yitzhak β Itzhak) |
100.0% | 93 |
initial (J. Smith β John Smith) |
100.0% | 112 |
middle_name add/drop |
100.0% | 50 |
title_suffix (Dr., Jr.) |
100.0% | 112 |
hyphenation, case_variation, combined, unrelated |
100.0% | 86 |
tricky_non_match (similar non-matches) |
97.9% | 331 |
partial_overlap |
97.7% | 353 |
unknown (real-world, no curated label) |
97.5% | 682 |
similar_name (Robert β Roberta) |
95.0% | 341 |
typo |
84.6% | 117 |
The model is strongest on canonical edge cases (nicknames, initials, transliteration) and weakest on character-level typos where it overlaps with the similar_name distribution.
How it was trained
- Base:
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli(184M params, 12-layer encoder with disentangled attention). - Adapter: LoRA (rank=16, alpha=32, dropout=0.1), targeting
query_proj,key_proj,value_projin every attention block.600K trainable parameters (0.3% of base). - Loss: Focal loss (Ξ³=2.0) β down-weights easy examples, lets the model focus on hard pairs (similar names, typos).
- Optimizer: AdamW, LR=2e-4, cosine schedule, 10% warmup, weight decay 0.01.
- Schedule: 10 epochs, batch size 32, max sequence length 128 tokens, BF16 mixed precision.
- Seed: 42.
- Data: ~174K balanced match/no-match pairs (after 2.5Γ augmentation) β half drawn from OpenSanctions entities (real public records), half from a synthetic generator covering the 15 edge cases in the table above. Train/validation/calibration splits are entity-level (no person appears in more than one split) using a deterministic MD5-based hash so the splits reproduce bit-for-bit.
Bias, Risks, and Limitations
- Latin script only. The model was trained on Latin-script names. It will not work well on Hebrew, Arabic, Chinese, Cyrillic, etc. scripts unless they are first transliterated.
- OpenSanctions skew. The real-world half of the training data is drawn from a public sanctions/PEP entity database. Names in that distribution skew toward political, business, and criminal figures, with heavy representation of Russian, Ukrainian, Iranian, Chinese, and Latin American transliterations and a long tail of titles and honorifics. The model may behave differently on, say, US consumer-database names than on this distribution.
- Pair-level only. This is a pairwise matcher: given two name strings, score their likelihood of being the same person. It does not do blocking, clustering, or one-to-many matching. For dedup over a large list, pair it with a blocking layer (cheap pre-filter on first-letter, soundex, etc.) before invoking the model.
- Names alone. No surrounding context (DOB, email, address). Two real-world people with the same name will score as a match. Use this as one signal among several in a real identity-matching pipeline, not as the sole decision.
- Typo accuracy is the weakest cell. 84.6% on character-level typos. If your input is OCR output or hand-transcribed names, expect more errors in this category and consider a separate spell-correction step before scoring.
- No production guarantees. This is a research/portfolio artifact. Performance on your distribution may differ. Evaluate on a sample of your own data before relying on it.
Intended use
- Record linkage and dedup of person-name fields in datasets where you have only name strings to work with.
- KYC and identity-matching workflows as one feature among several.
- Benchmarking and research on encoder-based entity matching.
Out-of-scope use
- Non-Latin scripts (Hebrew, Arabic, Chinese, etc.) without prior transliteration.
- Surveillance, social scoring, or any use that would single out individuals for adverse treatment based on a name-match score alone.
- High-stakes one-shot identity decisions (eligibility, denial, arrest, eviction) β the model gives a likelihood, not a verdict.
License
MIT. You are free to use, modify, and redistribute, including commercially, provided you keep the attribution and license notice.
Citation
If you use this model, a backlink to this repo or the author's profile is appreciated.
@misc{laor2026_person_name_match_v6,
author = {Elad Laor},
title = {Person Name Match Likelihood (v6) β a LoRA adapter on DeBERTa-v3 for pairwise person-name matching},
year = {2026},
url = {https://huggingface.co/LessLM/person-name-match-likelihood-v6}
}
Reproducibility & technical details
- Framework versions:
peft==0.18.1,transformers>=4.40,torch>=2.0. - Training environment: RunPod RTX 4090, ~3h wall-clock, BF16. Original v6 trained on RTX 3090 Ti (cross-GPU F1 delta: β0.0025).
- Seed: All randomness controlled by seed=42 (numpy, torch, transformers, dataloader generators). Re-running the training script with this seed and dataset version produces F1 within Β±0.005 across BF16-capable GPUs.
- Calibration:
calibration.ptis a single-parameter temperature scaler (T=0.95) fitted on a 1.5K held-out set. Apply it to logits before the final softmax to slightly reduce Expected Calibration Error from 0.016 to ~0.012.
- Downloads last month
- -
Model tree for LessLM/person-name-match-likelihood-v6
Base model
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli