NER Model for Contact Management Assistant Bot
This model is a fine-tuned RoBERTa-base model for Named Entity Recognition (NER) in contact management tasks.
Model Description
- Developed by: Mykyta Kotenko
- Base Model: roberta-base by Facebook AI
- Task: Token Classification (Named Entity Recognition)
- Language: English
- License: MIT
- Accuracy: 95.1%
- Entity Accuracy: 93.7%
- F1 Score: 94.6%
Supported Entities
This model extracts the following entity types:
- NAME: Person's full name
- PHONE: Phone numbers in various formats
- EMAIL: Email addresses
- ADDRESS: Full street addresses (including building numbers, street names, apartments, cities, states, ZIP codes)
- BIRTHDAY: Dates of birth
- TAG: Contact tags
- NOTE_TEXT: Note content
- ID: Contact/note identifiers
- DAYS: Time periods
Usage
Basic Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("kms-engineer/assistant-bot-ner-model")
model = AutoModelForTokenClassification.from_pretrained("kms-engineer/assistant-bot-ner-model")
# Create NER pipeline
ner_pipeline = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple" # Merge B-/I- tokens
)
# Extract entities
text = "Add contact John Smith 212-555-0123 john@example.com 123 Broadway, New York"
results = ner_pipeline(text)
for result in results:
print(f"{result['entity_group']}: {result['word']}")
Output:
NAME: John Smith
PHONE: 212-555-0123
EMAIL: john@example.com
ADDRESS: 123 Broadway, New York
Advanced Usage with Address Recognition
# Example with full address including building number
text = "Add contact Alon 212-555-0123 alon@example.com 45, 5 Ave, unit 34, New York"
results = ner_pipeline(text)
for result in results:
print(f"{result['entity_group']}: {result['word']}")
Output:
NAME: Alon
PHONE: 212-555-0123
EMAIL: alon@example.com
ADDRESS: 45, 5 Ave, unit 34, New York
Batch Processing
texts = [
"Add contact Sarah 718-555-4567 sarah@email.com lives at 123 Broadway, Apt 5B, NY 10001",
"Create contact Michael at 789 Park Avenue, Suite 12, Manhattan, NY 10021 phone 917-555-8901",
"Register David Martinez 1234 Sunset Boulevard, Los Angeles, CA 90028"
]
for text in texts:
results = ner_pipeline(text)
print(f"\nText: {text}")
for result in results:
print(f" - {result['entity_group']}: {result['word']}")
Training Details
Dataset
- Size: 2,185 training examples
- ADDRESS entities: 543 occurrences (including full street addresses with building numbers)
- NAME entities: 1,897 occurrences
- PHONE entities: 564 occurrences
- EMAIL entities: 415 occurrences
- BIRTHDAY entities: 252 occurrences
Training Configuration
- Base Model: roberta-base
- Learning Rate: 3e-5
- Batch Size: 16
- Max Length: 128 tokens
- Epochs: 5
- Optimizer: AdamW
- Training Framework: Hugging Face Transformers
Performance Metrics
| Metric | Value |
|---|---|
| Accuracy | 95.1% |
| Entity Accuracy | 93.7% |
| Precision | 94.9% |
| Recall | 95.1% |
| F1 Score | 94.6% |
Key Features
β Full Address Recognition
Unlike many NER models that only recognize city names, this model recognizes complete street addresses including:
- Building numbers (45, 123, 1234, etc.)
- Street names (Broadway, 5 Ave, Sunset Boulevard, etc.)
- Unit/Apartment numbers (unit 34, Apt 5B, Suite 12, Floor 3)
- Cities and states (New York, NY, Los Angeles, CA, etc.)
- ZIP codes (10001, 90028, 77002, etc.)
Example: Full Address Recognition
Before (typical NER models):
Input: "add address for Alon 45, 5 ave, unit 34, New York"
ADDRESS: "New York" β (only city)
After (this model):
Input: "add address for Alon 45, 5 ave, unit 34, New York"
ADDRESS: "45, 5 ave, unit 34, New York" β
(full address with building number!)
Example Predictions
Example 1: Complete Contact
text = "Add contact John Smith 212-555-0123 john@example.com 45, 5 Ave, unit 34, New York"
Extracted Entities:
- NAME: John Smith
- PHONE: 212-555-0123
- EMAIL: john@example.com
- ADDRESS: 45, 5 Ave, unit 34, New York
Example 2: Address with ZIP Code
text = "Create contact Sarah at 123 Broadway, Apt 5B, New York, NY 10001"
Extracted Entities:
- NAME: Sarah
- ADDRESS: 123 Broadway, Apt 5B, New York, NY 10001
Example 3: Complex Address
text = "Save contact for Michael at 789 Park Avenue, Suite 12, Manhattan, NY 10021 phone 917-555-8901"
Extracted Entities:
- NAME: Michael
- PHONE: 917-555-8901
- ADDRESS: 789 Park Avenue, Suite 12, Manhattan, NY 10021
Example 4: Different City
text = "Register David Martinez 1234 Sunset Boulevard, Los Angeles, CA 90028"
Extracted Entities:
- NAME: David Martinez
- ADDRESS: 1234 Sunset Boulevard, Los Angeles, CA 90028
Intended Use
This model is designed for:
- Contact management applications
- Personal assistant bots
- CRM systems with natural language interface
- Address extraction from text
- Contact information parsing
Limitations
- Optimized for US-style addresses - International addresses not yet in training data
- Best performance on English text - Other languages not supported
- Contact management domain - May not generalize well to other domains without fine-tuning
Model Architecture
Based on RoBERTa (Robustly Optimized BERT Pretraining Approach):
- Layers: 12 transformer layers
- Hidden size: 768
- Attention heads: 12
- Parameters: ~125M
- Task: Token Classification with IOB2 tagging scheme
Entity Label Format
The model uses IOB2 (Inside-Outside-Beginning) format:
B-{ENTITY}: Beginning of entityI-{ENTITY}: Inside/continuation of entityO: Outside any entity
Example:
Tokens: ["Add", "contact", "John", "Smith", "212", "-", "555", "-", "0123"]
Labels: ["O", "O", "B-NAME", "I-NAME", "B-PHONE", "I-PHONE", "I-PHONE", "I-PHONE", "I-PHONE"]
Citation
If you use this model, please cite:
@misc{kotenko2025nermodel,
author = {Kotenko, Mykyta},
title = {NER Model for Contact Management Assistant Bot},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/kms-engineer/assistant-bot-ner-model}},
note = {Based on RoBERTa by Facebook AI. Achieves 95.1\% accuracy with full address recognition including building numbers.}
}
Acknowledgments
- Base Model: RoBERTa by Facebook AI Research
- Framework: Hugging Face Transformers
- Training: Fine-tuned on custom contact management dataset with 2,185 examples
- Special Feature: Enhanced address recognition with building numbers, apartments, and full street addresses
Technical Improvements
This model includes several technical improvements over standard NER models:
- Enhanced Tokenization: Improved handling of addresses with fuzzy matching algorithm
- Rich Training Data: 115+ real-world address examples from major US cities
- Address Variations: Multiple formats including "address-first" patterns
- High Accuracy: 95.1% overall accuracy, 93.7% entity-level accuracy
Updates
- v1.0.0 (2025-01-18): Initial release
- 95.1% accuracy
- Full address recognition with building numbers
- 2,185 training examples
- Support for 9 entity types
License
MIT License - See LICENSE file for details.
This model is a derivative work based on RoBERTa, which is licensed under MIT License by Facebook, Inc.
Contact
- Author: Mykyta Kotenko
- Repository: assistant-bot
- Issues: Please report issues on GitHub
- Hugging Face: kms-engineer
Related Models
- Intent Classifier: kms-engineer/assistant-bot-intent-classifier
- Dataset: kms-engineer/assistant-bot-ner-dataset
- Downloads last month
- 36
Model tree for kms-engineer/assistant-bot-ner-model
Base model
FacebookAI/roberta-baseEvaluation results
- Accuracyself-reported0.951
- F1 Scoreself-reported0.946