FinSight NER — Financial Named Entity Recognition

A financial-domain NER model fine-tuned from bert-base-uncased on the FiNER-ORD dataset (Shah et al., 2024), a manually-annotated corpus of financial news articles.

Recognizes three entity types in BIO format:

PER — persons (executives, board members, individuals mentioned in news)
ORG — organizations (companies, banks, regulatory bodies, agencies)
LOC — locations (cities, states, countries, regions)

Part of the FinSight project.

Performance (test split, entity-level)

Micro-averaged across all entity types:

metric	value
precision	0.7876
recall	0.8464
f1	0.8159

Per-class breakdown: type precision recall f1 support

   LOC      0.7896   0.8633   0.8248       300
   ORG      0.7222   0.7993   0.7588       553
   PER      0.9261   0.9196   0.9228       286

 micro      0.7876   0.8464   0.8159

Training Setup

Setting	Value
Base model	`bert-base-uncased`
Dataset	`gtfintechlab/finer-ord-bio`
Train / Val / Test	3,261 / 402 / 1,075 sentences
Epochs	4
Batch size	16
Learning rate	3e-5 (linear warmup over 10% of steps)
Weight decay	0.01
Max sequence length	192
Optimizer	AdamW (default)
Mixed precision	fp16
Seed	42
Hardware	NVIDIA Tesla T4 (Kaggle)
Training runtime	~2.5 minutes

Label mapping

ID	Label
0	O
1	B-PER
2	I-PER
3	B-LOC
4	I-LOC
5	B-ORG
6	I-ORG

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="musk1209/finsight-ner",
    aggregation_strategy="simple",
)
ner("Jamie Dimon, CEO of JPMorgan Chase, addressed shareholders in London.")
# [{'entity_group': 'PER', 'word': 'jamie dimon', 'score': 0.99, ...},
#  {'entity_group': 'ORG', 'word': 'jpmorgan chase', 'score': 1.00, ...},
#  {'entity_group': 'LOC', 'word': 'london',       'score': 0.99, ...}]

Scope and limitations

Domain: Trained on Bloomberg-style financial news from 2015. Generalizes well to modern news-style prose (including SEC filings' narrative sections) but is not tuned for structured legal or contract language.
Coverage: Only 3 entity types. Money amounts, percentages, and dates are intentionally not covered — those are better handled by regex given their rigid patterns in financial text.
Text style: Best on well-formed sentences with proper capitalization. All-caps headlines or lowercased text may underperform.

Custom evaluation code

The seqeval library (the standard NER metric library) has a broken pyproject.toml that prevents installation on Python 3.12. This model was evaluated using a custom entity-level scorer with strict-match semantics; see src/fine_tuning/ner_metrics.py in the project repo.

Citation

@article{shah2024finerord, title = {FiNER-ORD: Financial Named Entity Recognition Open Research Dataset}, author = {Shah, Agam and Gullapalli, Abhinav and Vithani, Ruchit and Galarnyk, Michael and Chava, Sudheer}, journal = {arXiv preprint arXiv:2302.11157}, year = {2024} }

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for musk1209/finsight-ner

Base model

google-bert/bert-base-uncased

Finetuned

(6791)

this model

Dataset used to train musk1209/finsight-ner

Paper for musk1209/finsight-ner

FiNER: Financial Named Entity Recognition Dataset and Weak-Supervision Model

Paper • 2302.11157 • Published Feb 22, 2023