---
language: en
license: apache-2.0
---

# BiomedBERT Small

This is a `22.7M` parameter [BERT](https://arxiv.org/abs/1810.04805) encoder-only model trained on data from [PubMed](https://pubmed.ncbi.nlm.nih.gov/). The raw data was transformed using [PaperETL](https://github.com/neuml/paperetl) with the results stored as a local dataset via the [Hugging Face Datasets library](https://huggingface.co/docs/datasets/en/index).

This model is designed to be a solid-performing small model fitting in between the [110M parameter BiomedBERT Base model](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) and the [tiny BiomedBERT Hash series of models](https://huggingface.co/blog/NeuML/biomedbert-hash-nano).

## Usage

`biomedbert-small` can be loaded using [Hugging Face Transformers](https://huggingface.co/docs/transformers/en/index) as follows.

```python
from transformers import AutoModel

model = AutoModel.from_pretrained("neuml/biomedbert-small")
```

The model is intended to be further fine-tuned for a specific task such as Text Classification, Entity Extraction, Sentence Embeddings and so on.

## Evaluation Results

This [Medical Abstracts Text Classification Dataset](https://huggingface.co/datasets/TimSchopf/medical_abstracts) was used to evaluate the model's performance. A handful of biomedical models and general models were selected for comparison.

Metrics were generated using Hugging Face's standard [run_glue script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.py) as shown below.

```bash
python run_glue.py --model_name_or_path neuml/biomedbert-small --dataset-name medclassify --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 32 --learning_rate 1e-4 --num_train_epochs 4 --output_dir outputs --trust-remote-code True
```

_Note: The original dataset was saved locally as `medclassify` the the `condition_label` column renamed to `label` to work more easily with the glue script_

| Model | Parameters | Accuracy        | Loss             | 
| ----- | ---------- | --------------- | ---------------- |
| [biomedbert-hash-nano](https://hf.co/neuml/biomedbert-hash-nano) | 0.969M | 0.6195 | 0.9464 |
| [**biomedbert-small**](https://hf.co/neuml/biomedbert-small) | **22.7M** | **0.6274** | **0.8647** |
| [bert-base-uncased](https://hf.co/google-bert/bert-base-uncased) | 110M | 0.6118 | 0.9712 |
| [biomedbert-base](https://hf.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) | 110M | 0.6195 | 0.9037 |
| [ModernBERT-base](https://hf.co/answerdotai/ModernBERT-base) | 149M | 0.5672 | 1.1079 |
| [BioClinical-ModernBERT-base](https://hf.co/thomas-sounack/BioClinical-ModernBERT-base) | 149M | 0.5679 | 1.0915 |

As we can see, this model performs very well against models much larger in size. This dataset is a challenging one!

## More Information

Read more about the model in [this article](https://hf.co/blog/NeuML/biomedbert-small).