--- language: en license: apache-2.0 --- # BiomedBERT Small This is a `22.7M` parameter [BERT](https://arxiv.org/abs/1810.04805) encoder-only model trained on data from [PubMed](https://pubmed.ncbi.nlm.nih.gov/). The raw data was transformed using [PaperETL](https://github.com/neuml/paperetl) with the results stored as a local dataset via the [Hugging Face Datasets library](https://huggingface.co/docs/datasets/en/index). This model is designed to be a solid-performing small model fitting in between the [110M parameter BiomedBERT Base model](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) and the [tiny BiomedBERT Hash series of models](https://huggingface.co/blog/NeuML/biomedbert-hash-nano). ## Usage `biomedbert-small` can be loaded using [Hugging Face Transformers](https://huggingface.co/docs/transformers/en/index) as follows. ```python from transformers import AutoModel model = AutoModel.from_pretrained("neuml/biomedbert-small") ``` The model is intended to be further fine-tuned for a specific task such as Text Classification, Entity Extraction, Sentence Embeddings and so on. ## Evaluation Results This [Medical Abstracts Text Classification Dataset](https://huggingface.co/datasets/TimSchopf/medical_abstracts) was used to evaluate the model's performance. A handful of biomedical models and general models were selected for comparison. Metrics were generated using Hugging Face's standard [run_glue script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.py) as shown below. ```bash python run_glue.py --model_name_or_path neuml/biomedbert-small --dataset-name medclassify --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 32 --learning_rate 1e-4 --num_train_epochs 4 --output_dir outputs --trust-remote-code True ``` _Note: The original dataset was saved locally as `medclassify` the the `condition_label` column renamed to `label` to work more easily with the glue script_ | Model | Parameters | Accuracy | Loss | | ----- | ---------- | --------------- | ---------------- | | [biomedbert-hash-nano](https://hf.co/neuml/biomedbert-hash-nano) | 0.969M | 0.6195 | 0.9464 | | [**biomedbert-small**](https://hf.co/neuml/biomedbert-small) | **22.7M** | **0.6274** | **0.8647** | | [bert-base-uncased](https://hf.co/google-bert/bert-base-uncased) | 110M | 0.6118 | 0.9712 | | [biomedbert-base](https://hf.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) | 110M | 0.6195 | 0.9037 | | [ModernBERT-base](https://hf.co/answerdotai/ModernBERT-base) | 149M | 0.5672 | 1.1079 | | [BioClinical-ModernBERT-base](https://hf.co/thomas-sounack/BioClinical-ModernBERT-base) | 149M | 0.5679 | 1.0915 | As we can see, this model performs very well against models much larger in size. This dataset is a challenging one! ## More Information Read more about the model in [this article](https://hf.co/blog/NeuML/biomedbert-small).