|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
base_model: microsoft/mdeberta-v3-base |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
- name |
|
|
- person |
|
|
- company |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
model-index: |
|
|
- name: mdeberta-v3-base-name-classifier-v2 |
|
|
results: [] |
|
|
datasets: |
|
|
- ele-sage/person-company-names-classification |
|
|
language: |
|
|
- fr |
|
|
- en |
|
|
--- |
|
|
|
|
|
|
|
|
# mdeberta-v3-base-name-classifier-v2 |
|
|
|
|
|
This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on [ele-sage/person-company-names-classification](https://huggingface.co/ele-sage/person-company-names-classification). |
|
|
|
|
|
|
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 0.0732 |
|
|
- Accuracy: 0.9946 |
|
|
- Precision: 0.9989 |
|
|
- Recall: 0.9913 |
|
|
- F1: 0.9951 |
|
|
|
|
|
|
|
|
## Model description |
|
|
|
|
|
This model is a high-performance binary text classifier, fine-tuned from `mdeberta-v3-base`. |
|
|
Its purpose is to distinguish between a **person's name** and a **company/organization name** with high accuracy. |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model is intended to be used for text classification. Given a string, it will return a label indicating whether the string is a `Person` or a `Company`. |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline("text-classification", model="ele-sage/mdeberta-v3-base-name-classifier-v2") |
|
|
|
|
|
results = classifier([ |
|
|
"Satya Nadella", |
|
|
"Global Innovations Inc.", |
|
|
"Martinez, Alonso" |
|
|
]) |
|
|
|
|
|
for result in results: |
|
|
print(f"Text: '{result['text']}', Prediction: {result['label']}, Score: {result['score']:.4f}") |
|
|
``` |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
This model is a key component of a two-stage name processing pipeline. It is designed to be used as a fast, efficient "gatekeeper" to first identify person names before passing them to a more complex parsing model, such as `ele-sage/distilbert-base-uncased-name-splitter`. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- This model is not a general-purpose classifier. It is highly specialized for distinguishing persons from companies and will not perform well on other classification tasks (e.g., sentiment analysis). |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
- **Geographic & Cultural Bias:** The training data is heavily biased towards North American (Canadian) person names and Quebec-based company names. The model will be less accurate when classifying names from other cultural or geographic origins. |
|
|
- **Ambiguity:** Certain names can legitimately be both a person's name and a company's name (e.g., "Ford"). In these cases, the model makes a statistical guess based on its training data, which may not always align with the specific context. |
|
|
- **Data Source:** The person name data is derived from a Facebook data leak and contains noise. While a rigorous cleaning process was applied, the model may have learned from some spurious data. |
|
|
|
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 1e-05 |
|
|
- train_batch_size: 128 |
|
|
- eval_batch_size: 512 |
|
|
- seed: 42 |
|
|
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: cosine |
|
|
- lr_scheduler_warmup_ratio: 0.05 |
|
|
- num_epochs: 1 |
|
|
- label_smoothing_factor: 0.02 |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |
|
|
|:-------------:|:------:|:-----:|:---------------:|:--------:|:---------:|:------:|:------:| |
|
|
| 0.0914 | 0.0359 | 2000 | 0.0889 | 0.9907 | 0.9952 | 0.9882 | 0.9917 | |
|
|
| 0.0796 | 0.0718 | 4000 | 0.0864 | 0.9907 | 0.9991 | 0.9843 | 0.9916 | |
|
|
| 0.0808 | 0.1077 | 6000 | 0.0809 | 0.9919 | 0.9944 | 0.9910 | 0.9927 | |
|
|
| 0.0828 | 0.1436 | 8000 | 0.0774 | 0.9930 | 0.9976 | 0.9899 | 0.9937 | |
|
|
| 0.0787 | 0.1795 | 10000 | 0.0771 | 0.9931 | 0.9989 | 0.9886 | 0.9938 | |
|
|
| 0.0761 | 0.2154 | 12000 | 0.0774 | 0.9935 | 0.9984 | 0.9899 | 0.9942 | |
|
|
| 0.0779 | 0.2513 | 14000 | 0.0771 | 0.9935 | 0.9991 | 0.9892 | 0.9941 | |
|
|
| 0.0833 | 0.2872 | 16000 | 0.0751 | 0.9937 | 0.9985 | 0.9903 | 0.9944 | |
|
|
| 0.0812 | 0.3231 | 18000 | 0.0764 | 0.9935 | 0.9967 | 0.9915 | 0.9941 | |
|
|
| 0.0763 | 0.3590 | 20000 | 0.0753 | 0.9940 | 0.9990 | 0.9902 | 0.9946 | |
|
|
| 0.0753 | 0.3949 | 22000 | 0.0759 | 0.9936 | 0.9968 | 0.9917 | 0.9942 | |
|
|
| 0.0749 | 0.4308 | 24000 | 0.0750 | 0.9940 | 0.9980 | 0.9912 | 0.9946 | |
|
|
| 0.0755 | 0.4667 | 26000 | 0.0746 | 0.9939 | 0.9974 | 0.9917 | 0.9945 | |
|
|
| 0.0755 | 0.5026 | 28000 | 0.0756 | 0.9937 | 0.9967 | 0.9919 | 0.9943 | |
|
|
| 0.0753 | 0.5385 | 30000 | 0.0745 | 0.9942 | 0.9979 | 0.9916 | 0.9948 | |
|
|
| 0.0791 | 0.5744 | 32000 | 0.0735 | 0.9943 | 0.9991 | 0.9908 | 0.9949 | |
|
|
| 0.0789 | 0.6103 | 34000 | 0.0743 | 0.9939 | 0.9972 | 0.9918 | 0.9945 | |
|
|
| 0.073 | 0.6462 | 36000 | 0.0741 | 0.9943 | 0.9985 | 0.9913 | 0.9949 | |
|
|
| 0.0714 | 0.6821 | 38000 | 0.0738 | 0.9944 | 0.9989 | 0.9911 | 0.9950 | |
|
|
| 0.0738 | 0.7180 | 40000 | 0.0733 | 0.9945 | 0.9989 | 0.9912 | 0.9950 | |
|
|
| 0.0796 | 0.7539 | 42000 | 0.0732 | 0.9945 | 0.9987 | 0.9915 | 0.9951 | |
|
|
| 0.0726 | 0.7898 | 44000 | 0.0734 | 0.9945 | 0.9988 | 0.9914 | 0.9951 | |
|
|
| 0.0778 | 0.8257 | 46000 | 0.0733 | 0.9945 | 0.9988 | 0.9913 | 0.9951 | |
|
|
| 0.0734 | 0.8616 | 48000 | 0.0733 | 0.9945 | 0.9989 | 0.9914 | 0.9951 | |
|
|
| 0.0735 | 0.8975 | 50000 | 0.0732 | 0.9945 | 0.9988 | 0.9914 | 0.9951 | |
|
|
| 0.0696 | 0.9334 | 52000 | 0.0732 | 0.9945 | 0.9989 | 0.9913 | 0.9951 | |
|
|
| 0.0754 | 0.9693 | 54000 | 0.0732 | 0.9946 | 0.9989 | 0.9913 | 0.9951 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.57.1 |
|
|
- Pytorch 2.9.0+cu128 |
|
|
- Datasets 4.4.1 |
|
|
- Tokenizers 0.22.1 |
|
|
|