Italian_NER_XXL_v2 / README.md
DeepMount00's picture
Update README.md
e50650a verified
---
license: apache-2.0
language:
- it
- en
pipeline_tag: token-classification
tags:
- legal
- finance
- medical
- privacy
- named-entity-recognition
---
---
**💡 Found this resource helpful?** Creating and maintaining open source AI models and datasets requires significant computational resources. If this work has been valuable to you, consider [supporting my research](https://buymeacoffee.com/michele.montebovi) to help me continue building tools that benefit the entire AI community. Every contribution directly funds more open source innovation! ☕
---
# Italian_NER_XXL_v2
## 🚀 Model Overview
Welcome to the second generation of our state-of-the-art Named Entity Recognition model for Italian text. Building on the success of our previous version, Italian_NER_XXL_v2 delivers significantly enhanced performance with an **accuracy of 87.5%** and **F1 score of 89.2%** - an improvement of over 8 percentage points from my previous model.
## 💡 Key Improvements
- **Enhanced Accuracy**: From 79% to 87.5%
- **Better Context Understanding**: Improved recognition of entities in complex sentences
- **Reduced False Positives**: More precise identification of sensitive information
- **Expanded Training Data**: Trained on a more diverse corpus of Italian text
## 🏆 Market Leadership
Italian_NER_XXL_v2 remains the only model in Italy capable of identifying a comprehensive range of **52** different entity categories, maintaining our unique position in the Italian NLP landscape. This unparalleled breadth of entity recognition makes our model the premier choice for privacy, legal, and financial applications.
## 📋 Recognized Categories
Our model identifies an extensive range of entities across multiple domains:
### Personal Information
- **NOME**: First name of a person
- **COGNOME**: Last name of a person
- **DATA_NASCITA**: Date of birth
- **DATA_MORTE**: Date of death
- **ETA**: Age of a person
- **CODICE_FISCALE**: Italian tax code
- **PROFESSIONE**: Occupation or profession
- **STATO_CIVILE**: Civil status
### Contact Information
- **INDIRIZZO**: Physical address
- **NUMERO_TELEFONO**: Phone number
- **EMAIL**: Email address
- **CODICE_POSTALE**: Postal code
### Financial Information
- **VALUTA**: Currency
- **IMPORTO**: Monetary amount
- **NUMERO_CARTA**: Credit/debit card number
- **CVV**: Card security code
- **NUMERO_CONTO**: Bank account number
- **IBAN**: International bank account number
- **BIC**: Bank identifier code
- **P_IVA**: VAT number
- **TASSO_MUTUO**: Mortgage rate
- **NUM_ASSEGNO_BANCARIO**: Bank check number
- **BANCA**: Bank name
### Legal Entities
- **RAGIONE_SOCIALE**: Company legal name
- **TRIBUNALE**: Court identifier
- **LEGGE**: Law reference
- **N_SENTENZA**: Sentence number
- **N_LICENZA**: License number
- **AVV_NOTAIO**: Lawyer or notary reference
- **REGIME_PATRIMONIALE**: Property regime
### Medical Information
- **CARTELLA_CLINICA**: Medical record
- **MALATTIA**: Disease or medical condition
- **MEDICINA**: Medicine or medical treatment
- **STORIA_CLINICA**: Clinical history
- **STRENGTH**: Medicine strength
- **FREQUENZA**: Treatment frequency
- **DURATION**: Duration of treatment
- **DOSAGGIO**: Medicine dosage
- **FORM**: Medicine form (e.g., tablet)
### Technical Information
- **IP**: IP address
- **IPV6_1**: IPv6 address
- **MAC**: MAC address
- **USER_AGENT**: Browser user agent
- **IMEI**: Mobile device identifier
### Geographic and Temporal Data
- **STATO**: Country or nation
- **LUOGO**: Geographic location
- **ORARIO**: Specific time
- **DATA**: Generic date
### Document and Vehicle Information
- **NUMERO_DOCUMENTO**: Document number
- **TARGA_VEICOLO**: Vehicle license plate
- **FOGLIO**: Document sheet reference
- **PARTICELLA**: Land registry particle
- **MAPPALE**: Land registry map reference
- **SUBALTERNO**: Land registry subordinate reference
### Web and Security
- **URL**: Web address
- **PASSWORD**: Password
- **PIN**: Personal identification number
- **BRAND**: Commercial brand or trademark
## 💻 Implementation
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
model = AutoModelForTokenClassification.from_pretrained("DeepMount00/Italian_NER_XXL_v2")
# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
# Example text
example = """Il commendatore Gianluigi Alberico De Laurentis-Ponti, con residenza legale in Corso Imperatrice 67,
Torino, avente codice fiscale DLNGGL60B01L219P, è amministratore delegato della "De Laurentis Advanced Engineering
Group S.p.A.", che si trova in Piazza Affari 32, Milano (MI); con una partita IVA di 09876543210, la società è stata
recentemente incaricata di sviluppare una nuova linea di componenti aerospaziali per il progetto internazionale
di esplorazione di Marte."""
# Run NER
ner_results = nlp(example)
# Process results
for entity in ner_results:
print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")
```
## 🚀 Use Cases
- **Privacy Compliance**: GDPR data mapping and PII detection
- **Document Anonymization**: Automated redaction of sensitive information
- **Legal Document Analysis**: Extraction of key entities from contracts and legal texts
- **Financial Monitoring**: Detection of financial entities for compliance and fraud prevention
- **Medical Record Processing**: Structured extraction from clinical notes and reports
## 🔮 Future Development
We're committed to continuous improvement of the model:
- Quarterly updates with further accuracy enhancements
- Expansion to include new entity types based on user feedback
- Development of domain-specific variants for specialized applications
- Integration of contextual entity linking capabilities
## 👥 Contribution and Contact
Your feedback is essential to improving this model. If you're interested in contributing, have suggestions, or need a customized NER solution, please contact:
Michele Montebovi
Email: [montebovi.michele@gmail.com](mailto:montebovi.michele@gmail.com)
We welcome collaboration from the Italian NLP community to further enhance this tool and expand its applications across industries.
## 📝 Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{montebovi2025italiannerxxl,
author = {Montebovi, Michele},
title = {Italian\_NER\_XXL\_v2: A Comprehensive Named Entity Recognition Model for Italian},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/DeepMount00/Italian_NER_XXL_v2}}
}
```