Token Classification
SpanMarker
TensorBoard
Safetensors
ner
named-entity-recognition
generated_from_span_marker_trainer
Eval Results (legacy)
Instructions to use tmwstw7/spanmarker-bert-base-cased-custom-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SpanMarker
How to use tmwstw7/spanmarker-bert-base-cased-custom-small with SpanMarker:
from span_marker import SpanMarkerModel model = SpanMarkerModel.from_pretrained("tmwstw7/spanmarker-bert-base-cased-custom-small") - Notebooks
- Google Colab
- Kaggle
SpanMarker
This is a SpanMarker model that can be used for Named Entity Recognition.
Model Details
Model Description
- Model Type: SpanMarker
- Maximum Sequence Length: 512 tokens
- Maximum Entity Length: 12 words
Model Sources
- Repository: SpanMarker on GitHub
- Thesis: SpanMarker For Named Entity Recognition
Model Labels
| Label | Examples |
|---|---|
| action | "Remind", "scheduled", "review" |
| app_data_type | "items", "images", "videos" |
| app_name | "Camera", "phone", "Slack" |
| contact_info | "sarah . lee @ company . org", "123 Maple Street , Springfield", "home address" |
| date | "20 . 10 . 1999", "before", "January 18 - June 15" |
| event_title | "team sync", "Marketing Strategy Meeting", "Budget Planning" |
| file_name | "notes", "budget_overview . xlsx", "project_plan . docx" |
| file_size | "under 500 kb", "smaller than 50 kb", "exceeding 100 mb" |
| file_type | "documents", "document", "image" |
| folder_name | "Projects", "Work", "Photos" |
| in_file_data | "appendix section", "page 10", "section 5" |
| limits | "top 8", "all", "every" |
| location | "Room 204", "server room", "library" |
| person_name | "Jonathan Kim", "Mr . Osei", "Lucas Mรผller" |
| relationship | "manager", "brother", "cousin" |
| setting | "brightness", "airplane mode", "notifications" |
| system_command | "disable", "move", "switch on" |
| time | "9 : 00 AM", "10 : 45", "10 : 00 AM" |
Evaluation
Metrics
| Label | Precision | Recall | F1 |
|---|---|---|---|
| all | 0.8559 | 0.8813 | 0.8684 |
| action | 0.8173 | 0.9245 | 0.8676 |
| app_data_type | 0.7960 | 0.6828 | 0.7351 |
| app_name | 0.9432 | 0.9432 | 0.9432 |
| contact_info | 0.8722 | 0.9091 | 0.8903 |
| date | 0.9160 | 0.8993 | 0.9076 |
| event_title | 0.8659 | 0.9107 | 0.8877 |
| file_name | 0.9371 | 0.9280 | 0.9326 |
| file_size | 0.7810 | 0.7810 | 0.7810 |
| file_type | 0.7731 | 0.8786 | 0.8225 |
| folder_name | 0.9618 | 0.8968 | 0.9282 |
| in_file_data | 0.7486 | 0.7867 | 0.7672 |
| limits | 0.9048 | 0.6786 | 0.7755 |
| location | 0.8917 | 0.8571 | 0.8741 |
| person_name | 0.9885 | 0.9885 | 0.9885 |
| relationship | 0.9505 | 0.9541 | 0.9523 |
| setting | 0.8974 | 0.9255 | 0.9112 |
| system_command | 0.7889 | 0.7441 | 0.7659 |
| time | 0.9076 | 0.8587 | 0.8825 |
Uses
Direct Use for Inference
from span_marker import SpanMarkerModel
# Download from the ๐ค Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("Text my mother at + 44 7911 123456 the summary from paragraph 4, and then enable bluetooth")
Downstream Use
You can finetune this model on your own dataset.
Click to expand
from span_marker import SpanMarkerModel, Trainer
# Download from the ๐ค Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003
# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")
Training Details
Training Set Metrics
| Training set | Min | Median | Max |
|---|---|---|---|
| Sentence length | 3 | 19.0206 | 53 |
| Entities per sentence | 1 | 5.7015 | 13 |
Training Hyperparameters
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
- mixed_precision_training: Native AMP
Training Results
| Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
|---|---|---|---|---|---|---|
| 1.8553 | 1000 | 0.0344 | 0.8301 | 0.8650 | 0.8472 | 0.9204 |
| 3.7106 | 2000 | 0.0271 | 0.8524 | 0.8804 | 0.8662 | 0.9316 |
Framework Versions
- Python: 3.12.12
- SpanMarker: 1.7.0
- Transformers: 4.51.3
- PyTorch: 2.8.0+cu126
- Datasets: 3.6.0
- Tokenizers: 0.21.4
Citation
BibTeX
@software{Aarsen_SpanMarker,
author = {Aarsen, Tom},
license = {Apache-2.0},
title = {{SpanMarker for Named Entity Recognition}},
url = {https://github.com/tomaarsen/SpanMarkerNER}
}
- Downloads last month
- 6
Evaluation results
- F1 on Unknownself-reported0.868
- Precision on Unknownself-reported0.856
- Recall on Unknownself-reported0.881