NER Model for Contact Management Assistant Bot

This model is a fine-tuned RoBERTa-base model for Named Entity Recognition (NER) in contact management tasks.

Model Description

  • Developed by: Mykyta Kotenko
  • Base Model: roberta-base by Facebook AI
  • Task: Token Classification (Named Entity Recognition)
  • Language: English
  • License: MIT
  • Accuracy: 95.1%
  • Entity Accuracy: 93.7%
  • F1 Score: 94.6%

Supported Entities

This model extracts the following entity types:

  • NAME: Person's full name
  • PHONE: Phone numbers in various formats
  • EMAIL: Email addresses
  • ADDRESS: Full street addresses (including building numbers, street names, apartments, cities, states, ZIP codes)
  • BIRTHDAY: Dates of birth
  • TAG: Contact tags
  • NOTE_TEXT: Note content
  • ID: Contact/note identifiers
  • DAYS: Time periods

Usage

Basic Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("kms-engineer/assistant-bot-ner-model")
model = AutoModelForTokenClassification.from_pretrained("kms-engineer/assistant-bot-ner-model")

# Create NER pipeline
ner_pipeline = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"  # Merge B-/I- tokens
)

# Extract entities
text = "Add contact John Smith 212-555-0123 john@example.com 123 Broadway, New York"
results = ner_pipeline(text)

for result in results:
    print(f"{result['entity_group']}: {result['word']}")

Output:

NAME: John Smith
PHONE: 212-555-0123
EMAIL: john@example.com
ADDRESS: 123 Broadway, New York

Advanced Usage with Address Recognition

# Example with full address including building number
text = "Add contact Alon 212-555-0123 alon@example.com 45, 5 Ave, unit 34, New York"
results = ner_pipeline(text)

for result in results:
    print(f"{result['entity_group']}: {result['word']}")

Output:

NAME: Alon
PHONE: 212-555-0123
EMAIL: alon@example.com
ADDRESS: 45, 5 Ave, unit 34, New York

Batch Processing

texts = [
    "Add contact Sarah 718-555-4567 sarah@email.com lives at 123 Broadway, Apt 5B, NY 10001",
    "Create contact Michael at 789 Park Avenue, Suite 12, Manhattan, NY 10021 phone 917-555-8901",
    "Register David Martinez 1234 Sunset Boulevard, Los Angeles, CA 90028"
]

for text in texts:
    results = ner_pipeline(text)
    print(f"\nText: {text}")
    for result in results:
        print(f"  - {result['entity_group']}: {result['word']}")

Training Details

Dataset

  • Size: 2,185 training examples
  • ADDRESS entities: 543 occurrences (including full street addresses with building numbers)
  • NAME entities: 1,897 occurrences
  • PHONE entities: 564 occurrences
  • EMAIL entities: 415 occurrences
  • BIRTHDAY entities: 252 occurrences

Training Configuration

  • Base Model: roberta-base
  • Learning Rate: 3e-5
  • Batch Size: 16
  • Max Length: 128 tokens
  • Epochs: 5
  • Optimizer: AdamW
  • Training Framework: Hugging Face Transformers

Performance Metrics

Metric Value
Accuracy 95.1%
Entity Accuracy 93.7%
Precision 94.9%
Recall 95.1%
F1 Score 94.6%

Key Features

βœ… Full Address Recognition

Unlike many NER models that only recognize city names, this model recognizes complete street addresses including:

  • Building numbers (45, 123, 1234, etc.)
  • Street names (Broadway, 5 Ave, Sunset Boulevard, etc.)
  • Unit/Apartment numbers (unit 34, Apt 5B, Suite 12, Floor 3)
  • Cities and states (New York, NY, Los Angeles, CA, etc.)
  • ZIP codes (10001, 90028, 77002, etc.)

Example: Full Address Recognition

Before (typical NER models):

Input: "add address for Alon 45, 5 ave, unit 34, New York"
ADDRESS: "New York" ❌ (only city)

After (this model):

Input: "add address for Alon 45, 5 ave, unit 34, New York"
ADDRESS: "45, 5 ave, unit 34, New York" βœ… (full address with building number!)

Example Predictions

Example 1: Complete Contact

text = "Add contact John Smith 212-555-0123 john@example.com 45, 5 Ave, unit 34, New York"

Extracted Entities:

  • NAME: John Smith
  • PHONE: 212-555-0123
  • EMAIL: john@example.com
  • ADDRESS: 45, 5 Ave, unit 34, New York

Example 2: Address with ZIP Code

text = "Create contact Sarah at 123 Broadway, Apt 5B, New York, NY 10001"

Extracted Entities:

  • NAME: Sarah
  • ADDRESS: 123 Broadway, Apt 5B, New York, NY 10001

Example 3: Complex Address

text = "Save contact for Michael at 789 Park Avenue, Suite 12, Manhattan, NY 10021 phone 917-555-8901"

Extracted Entities:

  • NAME: Michael
  • PHONE: 917-555-8901
  • ADDRESS: 789 Park Avenue, Suite 12, Manhattan, NY 10021

Example 4: Different City

text = "Register David Martinez 1234 Sunset Boulevard, Los Angeles, CA 90028"

Extracted Entities:

  • NAME: David Martinez
  • ADDRESS: 1234 Sunset Boulevard, Los Angeles, CA 90028

Intended Use

This model is designed for:

  • Contact management applications
  • Personal assistant bots
  • CRM systems with natural language interface
  • Address extraction from text
  • Contact information parsing

Limitations

  • Optimized for US-style addresses - International addresses not yet in training data
  • Best performance on English text - Other languages not supported
  • Contact management domain - May not generalize well to other domains without fine-tuning

Model Architecture

Based on RoBERTa (Robustly Optimized BERT Pretraining Approach):

  • Layers: 12 transformer layers
  • Hidden size: 768
  • Attention heads: 12
  • Parameters: ~125M
  • Task: Token Classification with IOB2 tagging scheme

Entity Label Format

The model uses IOB2 (Inside-Outside-Beginning) format:

  • B-{ENTITY}: Beginning of entity
  • I-{ENTITY}: Inside/continuation of entity
  • O: Outside any entity

Example:

Tokens:  ["Add", "contact", "John", "Smith", "212", "-", "555", "-", "0123"]
Labels:  ["O",   "O",       "B-NAME", "I-NAME", "B-PHONE", "I-PHONE", "I-PHONE", "I-PHONE", "I-PHONE"]

Citation

If you use this model, please cite:

@misc{kotenko2025nermodel,
  author = {Kotenko, Mykyta},
  title = {NER Model for Contact Management Assistant Bot},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/kms-engineer/assistant-bot-ner-model}},
  note = {Based on RoBERTa by Facebook AI. Achieves 95.1\% accuracy with full address recognition including building numbers.}
}

Acknowledgments

  • Base Model: RoBERTa by Facebook AI Research
  • Framework: Hugging Face Transformers
  • Training: Fine-tuned on custom contact management dataset with 2,185 examples
  • Special Feature: Enhanced address recognition with building numbers, apartments, and full street addresses

Technical Improvements

This model includes several technical improvements over standard NER models:

  1. Enhanced Tokenization: Improved handling of addresses with fuzzy matching algorithm
  2. Rich Training Data: 115+ real-world address examples from major US cities
  3. Address Variations: Multiple formats including "address-first" patterns
  4. High Accuracy: 95.1% overall accuracy, 93.7% entity-level accuracy

Updates

  • v1.0.0 (2025-01-18): Initial release
    • 95.1% accuracy
    • Full address recognition with building numbers
    • 2,185 training examples
    • Support for 9 entity types

License

MIT License - See LICENSE file for details.

This model is a derivative work based on RoBERTa, which is licensed under MIT License by Facebook, Inc.

Contact

Related Models

Downloads last month
36
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kms-engineer/assistant-bot-ner-model

Finetuned
(2052)
this model

Evaluation results