Model Card for DistilBERT NER Model

Model Details

Model Description

This model is a fine-tuned version of distilbert-base-uncased for Named Entity Recognition (NER). It was fine-tuned on a domain-specific dataset to classify tokens into entities related to finance and compensation, as well as general non-entity tokens.

  • Developed by: Burak Kilic
  • Model type: Token Classification (NER)
  • Language(s): English
  • Finetuned from model: DistilBERT (uncased)

Model Sources

Uses

Direct Use

This model is intended for Named Entity Recognition tasks and can be directly used to identify entities in financial texts, such as: B-DebtInstrumentInterestRateStatedPercentage B-LineOfCreditFacilityMaximumBorrowingCapacity B-DebtInstrumentBasisSpreadOnVariableRate1 B-AllocatedShareBasedCompensationExpense

Out-of-Scope Use

This model is not suitable for tasks outside Named Entity Recognition or for domains unrelated to finance.

Training Details

Training Procedure

  • Dataset: Custom annotated dataset with ~20,000 training examples.
  • Layers Fine-Tuned: Fully connected classification layer added for NER.
  • Training Regime: Mixed precision (fp16) with AdamW optimizer.

Training Hyperparameters

Parameter Value
Learning Rate 2e-5
Warmup Ratio 0.1
Batch Size 128
Epochs 8
Weight Decay 0.01
Mixed Precision Training True
Evaluation Metric F1

Evaluation

Testing Data

The model was evaluated on a test set of ~1,600 examples, balanced across multiple entity types.

Results

  • Precision: 96%
  • Recall: 98%
  • F1 Score: 97%
  • Accuracy: 99.77%

How to Get Started with the Model

from transformers import pipeline

# Load the fine-tuned model
ner_pipeline = pipeline("ner", model="sojimanatsu/sojimanatsu/finer-selected-4-labels")

# Example text
text = "The bond yields 4.5% annually."
entities = ner_pipeline(text)
print(entities)

Limitations

  • Performance may degrade for texts outside the finance domain.
  • Rare entities may have lower recognition rates.
Downloads last month
14
Safetensors
Model size
66.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sojimanatsu/finer-selected-4-labels

Finetuned
(10500)
this model