Overview

  • This is a BERT-based multi-label token classification model fine tuned on the OntoNotes5 dataset.
  • The entities are one-hot encoded using the BIES (Begin/Inside/End/Single) scheme. As this is a multi-label model, there is no "Outside" label, for clasically outside tokens no class is predicted.
  • The model comes with a pipeline to extract named entities from the model predictions
  • For a short overview of the adaptions for multi-label token classification, see the non-finetuned parent model jvaquet/multilabel-classification-bert.

Pipeline Usage

Using the NER pipeline is rahter simple:

from transformers import pipeline

pipe = pipeline(model='jvaquet/multilabel-classification-bert-ontonotes5',
  stride=128,
  threshold=0.5,
  use_hierarchy_heuristic=False,
  trust_remote_code=True)

entities = pipe(my_text)

The parameters are:

  • stride - int: Stride for the tokenizer. When the text length exceeds tokenizer.model_max_length, it splits the input accordingly with the specified stride.
  • threshold - float: Threshold for entitiy detection. Sigmoid of the logits.
  • use_hierarchy_heuristic - bool: Apply heuristic to suppress additional entities when entities of same class overlap hierarchically.
Downloads last month
48
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jvaquet/multilabel-classification-bert-ontonotes5

Finetuned
(6)
this model

Dataset used to train jvaquet/multilabel-classification-bert-ontonotes5

Collection including jvaquet/multilabel-classification-bert-ontonotes5