FinSight NER — Financial Named Entity Recognition

A financial-domain NER model fine-tuned from bert-base-uncased on the FiNER-ORD dataset (Shah et al., 2024), a manually-annotated corpus of financial news articles.

Recognizes three entity types in BIO format:

  • PER — persons (executives, board members, individuals mentioned in news)
  • ORG — organizations (companies, banks, regulatory bodies, agencies)
  • LOC — locations (cities, states, countries, regions)

Part of the FinSight project.

Performance (test split, entity-level)

Micro-averaged across all entity types:

metric value
precision 0.7876
recall 0.8464
f1 0.8159

Per-class breakdown: type precision recall f1 support

   LOC      0.7896   0.8633   0.8248       300
   ORG      0.7222   0.7993   0.7588       553
   PER      0.9261   0.9196   0.9228       286

 micro      0.7876   0.8464   0.8159

Training Setup

Setting Value
Base model bert-base-uncased
Dataset gtfintechlab/finer-ord-bio
Train / Val / Test 3,261 / 402 / 1,075 sentences
Epochs 4
Batch size 16
Learning rate 3e-5 (linear warmup over 10% of steps)
Weight decay 0.01
Max sequence length 192
Optimizer AdamW (default)
Mixed precision fp16
Seed 42
Hardware NVIDIA Tesla T4 (Kaggle)
Training runtime ~2.5 minutes

Label mapping

ID Label
0 O
1 B-PER
2 I-PER
3 B-LOC
4 I-LOC
5 B-ORG
6 I-ORG

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="musk1209/finsight-ner",
    aggregation_strategy="simple",
)
ner("Jamie Dimon, CEO of JPMorgan Chase, addressed shareholders in London.")
# [{'entity_group': 'PER', 'word': 'jamie dimon', 'score': 0.99, ...},
#  {'entity_group': 'ORG', 'word': 'jpmorgan chase', 'score': 1.00, ...},
#  {'entity_group': 'LOC', 'word': 'london',       'score': 0.99, ...}]

Scope and limitations

  • Domain: Trained on Bloomberg-style financial news from 2015. Generalizes well to modern news-style prose (including SEC filings' narrative sections) but is not tuned for structured legal or contract language.
  • Coverage: Only 3 entity types. Money amounts, percentages, and dates are intentionally not covered — those are better handled by regex given their rigid patterns in financial text.
  • Text style: Best on well-formed sentences with proper capitalization. All-caps headlines or lowercased text may underperform.

Custom evaluation code

The seqeval library (the standard NER metric library) has a broken pyproject.toml that prevents installation on Python 3.12. This model was evaluated using a custom entity-level scorer with strict-match semantics; see src/fine_tuning/ner_metrics.py in the project repo.

Citation

@article{shah2024finerord, title = {FiNER-ORD: Financial Named Entity Recognition Open Research Dataset}, author = {Shah, Agam and Gullapalli, Abhinav and Vithani, Ruchit and Galarnyk, Michael and Chava, Sudheer}, journal = {arXiv preprint arXiv:2302.11157}, year = {2024} }

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for musk1209/finsight-ner

Finetuned
(6791)
this model

Dataset used to train musk1209/finsight-ner

Paper for musk1209/finsight-ner