DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 23
How to use bioformers/bioformer-8L-squad1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("question-answering", model="bioformers/bioformer-8L-squad1") # Load model directly
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("bioformers/bioformer-8L-squad1")
model = AutoModelForQuestionAnswering.from_pretrained("bioformers/bioformer-8L-squad1")bioformer-8L fined-tuned on the SQuAD1 dataset for 3 epochs.
The fine-tuning process was performed on a single P100 GPUs (16GB). The hyperparameters are:
max_seq_length=512
per_device_train_batch_size=16
gradient_accumulation_steps=1
total train batch size (w. parallel, distributed & accumulation) = 16
learning_rate=3e-5
num_train_epochs=3
"eval_exact_match": 78.55250709555345
"eval_f1": 85.91482799690257
Bioformer's performance is on par with DistilBERT (EM/F1: 77.7/85.8), although Bioformer was pretrained only on biomedical texts.
In our experiments, the inference speed of Bioformer is 3x as fast as BERT-base/BioBERT/PubMedBERT, and is 40% faster than DistilBERT.