FinSight NER — Financial Named Entity Recognition
A financial-domain NER model fine-tuned from bert-base-uncased on the
FiNER-ORD
dataset (Shah et al., 2024), a manually-annotated corpus of financial
news articles.
Recognizes three entity types in BIO format:
- PER — persons (executives, board members, individuals mentioned in news)
- ORG — organizations (companies, banks, regulatory bodies, agencies)
- LOC — locations (cities, states, countries, regions)
Part of the FinSight project.
Performance (test split, entity-level)
Micro-averaged across all entity types:
| metric | value |
|---|---|
| precision | 0.7876 |
| recall | 0.8464 |
| f1 | 0.8159 |
Per-class breakdown: type precision recall f1 support
LOC 0.7896 0.8633 0.8248 300
ORG 0.7222 0.7993 0.7588 553
PER 0.9261 0.9196 0.9228 286
micro 0.7876 0.8464 0.8159
Training Setup
| Setting | Value |
|---|---|
| Base model | bert-base-uncased |
| Dataset | gtfintechlab/finer-ord-bio |
| Train / Val / Test | 3,261 / 402 / 1,075 sentences |
| Epochs | 4 |
| Batch size | 16 |
| Learning rate | 3e-5 (linear warmup over 10% of steps) |
| Weight decay | 0.01 |
| Max sequence length | 192 |
| Optimizer | AdamW (default) |
| Mixed precision | fp16 |
| Seed | 42 |
| Hardware | NVIDIA Tesla T4 (Kaggle) |
| Training runtime | ~2.5 minutes |
Label mapping
| ID | Label |
|---|---|
| 0 | O |
| 1 | B-PER |
| 2 | I-PER |
| 3 | B-LOC |
| 4 | I-LOC |
| 5 | B-ORG |
| 6 | I-ORG |
Usage
from transformers import pipeline
ner = pipeline(
"token-classification",
model="musk1209/finsight-ner",
aggregation_strategy="simple",
)
ner("Jamie Dimon, CEO of JPMorgan Chase, addressed shareholders in London.")
# [{'entity_group': 'PER', 'word': 'jamie dimon', 'score': 0.99, ...},
# {'entity_group': 'ORG', 'word': 'jpmorgan chase', 'score': 1.00, ...},
# {'entity_group': 'LOC', 'word': 'london', 'score': 0.99, ...}]
Scope and limitations
- Domain: Trained on Bloomberg-style financial news from 2015. Generalizes well to modern news-style prose (including SEC filings' narrative sections) but is not tuned for structured legal or contract language.
- Coverage: Only 3 entity types. Money amounts, percentages, and dates are intentionally not covered — those are better handled by regex given their rigid patterns in financial text.
- Text style: Best on well-formed sentences with proper capitalization. All-caps headlines or lowercased text may underperform.
Custom evaluation code
The seqeval library (the standard NER metric library) has a broken
pyproject.toml that prevents installation on Python 3.12. This model was
evaluated using a custom entity-level scorer with strict-match semantics;
see src/fine_tuning/ner_metrics.py
in the project repo.
Citation
@article{shah2024finerord, title = {FiNER-ORD: Financial Named Entity Recognition Open Research Dataset}, author = {Shah, Agam and Gullapalli, Abhinav and Vithani, Ruchit and Galarnyk, Michael and Chava, Sudheer}, journal = {arXiv preprint arXiv:2302.11157}, year = {2024} }
- Downloads last month
- -
Model tree for musk1209/finsight-ner
Base model
google-bert/bert-base-uncased