How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("token-classification", model="NoYo25/BiodivBERT")
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")
model = AutoModelForMaskedLM.from_pretrained("NoYo25/BiodivBERT")
Quick Links

BiodivBERT

Model description

  • BiodivBERT is a domain-specific BERT based cased model for the biodiversity literature.
  • It uses the tokenizer from BERTT base cased model.
  • BiodivBERT is pre-trained on abstracts and full text from biodiversity literature.
  • BiodivBERT is fine-tuned on two down stream tasks for Named Entity Recognition and Relation Extraction in the biodiversity domain.
  • Please visit our GitHub Repo for more details.

How to use

  • You can use BiodivBERT via huggingface library as follows:
  1. Masked Language Model
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM

>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")

>>> model = AutoModelForMaskedLM.from_pretrained("NoYo25/BiodivBERT")
  1. Token Classification - Named Entity Recognition
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification

>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")

>>> model = AutoModelForTokenClassification.from_pretrained("NoYo25/BiodivBERT")
  1. Sequence Classification - Relation Extraction
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")

>>> model = AutoModelForSequenceClassification.from_pretrained("NoYo25/BiodivBERT")

Training data

  • BiodivBERT is pre-trained on abstracts and full text from biodiversity domain-related publications.
  • We used both Elsevier and Springer APIs to crawl such data.
  • We covered publications over the duration of 1990-2020.

Evaluation results

BiodivBERT overperformed both BERT_base_cased, biobert_v1.1, and BiLSTM as a baseline approach on the down stream tasks.

Downloads last month
172
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support