BiLSTM-CRF for NER (OntoNotes 5.0)

This is a custom BiLSTM-CRF model fine-tuned on the English subset of the OntoNotes 5.0 (CoNLL-2012) dataset. Unlike Transformer-based models, this architecture combines the sequential feature extraction of BiLSTMs with the structural inference of a Conditional Random Field (CRF) layer, initialized with pre-trained GloVe 300d word embeddings.

πŸ“Š Performance

The model was evaluated on the OntoNotes 5.0 (v12) official test set using seqeval:

Entity Precision Recall F1-Score Support
CARDINAL 0.7310 0.7572 0.7439 1005
DATE 0.7970 0.8309 0.8136 1786
EVENT 0.6180 0.6471 0.6322 85
FAC 0.5678 0.4497 0.5019 149
GPE 0.8621 0.8818 0.8718 2546
LOC 0.6491 0.6884 0.6682 215
MONEY 0.8575 0.8648 0.8612 355
NORP 0.8734 0.8778 0.8756 990
ORG 0.8195 0.8232 0.8213 2002
PERSON 0.8707 0.8454 0.8578 2134
micro avg 0.8099 0.8201 0.8150 12585
macro avg 0.7040 0.7073 0.7046 12585
weighted avg 0.8103 0.8201 0.8148 12585

πŸ›  Model Architecture

  • Embedding Layer: GloVe 300d (wiki-gigaword), fine-tuned during training.
  • Encoder: 2-layer Bi-directional LSTM with 512 hidden units.
  • Decoder: Linear-chain CRF for optimal tag sequence decoding.
  • Dropout: 0.5 (Applied to embeddings and LSTM outputs).

πŸ“‚ Project Assets

Asset File Description
Model Weights bilstm_crf_model.bin PyTorch state dictionary (~85.8 MB).
Vocabulary vocab.pth Pickled word-to-index mapping.
Label List label_list.pth Pickled NER tag list (BIO format).
Documentation README.md Model card and usage instructions.

πŸ“‚ Training Infrastructure

  • Framework: PyTorch with DistributedDataParallel (DDP).
  • Hardware: Multi-GPU (NVIDIA V100) setup with NCCL backend.
  • Hyperparameters:
    • Optimizer: AdamW (lr=1e-3, weight_decay=0.01)
    • Scheduler: Linear warmup with decay (warmup_ratio=0.1)
    • Epochs: 20
    • Batch Size: 32 per GPU (Effective batch size 64)
    • Max Length: 128 tokens

πŸš€ Usage

import torch
from model import BiLSTM_CRF # Ensure class definition is accessible

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 1. Load mappings
vocab = torch.load("vocab.pth")
label_list = torch.load("label_list.pth")

# 2. Initialize and Load Weights
model = BiLSTM_CRF(
    v_size=len(vocab), 
    t_size=len(label_list), 
    e_dim=300, 
    h_dim=512, 
    w_matrix=torch.zeros(len(vocab), 300)
)

state_dict = torch.load("best_bilstm_crf_ddp.pth", map_location=device)
# Standardize keys (remove 'module.' from DDP training)
new_state_dict = {k.replace('module.', ''): v for k, v in state_dict.items()}
model.load_state_dict(new_state_dict)
model.to(device).eval()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train learnrr/bilstm-crf-ontonotes5-ner