microbELP_NER / README.md
Antoinelfr's picture
Update README.md
9d3e648 verified
---
license: mit
language:
- en
base_model:
- dmis-lab/biobert-base-cased-v1.1
pipeline_tag: token-classification
---
[![Paper](https://img.shields.io/badge/Paper-View%20on%20bioRxiv-orange?logo=biorxiv&logoColor=white)](https://www.biorxiv.org/content/10.1101/2025.08.29.671515v1)
[![GitHub](https://img.shields.io/badge/GitHub-omicsNLP%2FmicrobELP-blue?logo=github)](https://github.com/omicsNLP/microbELP)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/omicsNLP/microbELP/blob/main/LICENSE)
# 🦠 MicrobELP β€” Microbiome Entity Recognition and Normalisation
MicrobELP is a deep learning model for Microbiome Entity Recognition and Normalisation, identifying microbial entities (bacteria, archaea, fungi) in biomedical and scientific text.
It is part of the [microbELP](https://github.com/omicsNLP/microbELP) toolkit and has been optimised for CPU and GPU inference.
This model enables automated extraction of microbiome names from unstructured text, facilitating microbiome-related text mining and literature curation.
We also provide a Named Entity Normalisation model on Hugging Face:
[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-microbELP_NEN-FFD21E)](https://huggingface.co/omicsNLP/microbELP_NEN)
---
## πŸš€ Quick Start (Hugging Face)
You can directly load and run the model with the Hugging Face `transformers` pipeline:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("omicsNLP/microbELP_NER")
model = AutoModelForTokenClassification.from_pretrained("omicsNLP/microbELP_NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "The first microbiome I learned about is called Helicobacter pylori."
ner_results = nlp(example)
print(ner_results)
```
Output:
```
[
{'entity': 'LABEL_0', 'score': 0.9954, 'index': 1, 'word': 'the', 'start': 0, 'end': 3},
...
{'entity': 'LABEL_1', 'score': 0.9889, 'index': 11, 'word': 'he', 'start': 47, 'end': 49},
{'entity': 'LABEL_2', 'score': 0.9710, 'index': 16, 'word': 'p', 'start': 60, 'end': 61},
...
]
```
where:
- LABEL_0 β†’ Outside (O)
- LABEL_1 β†’ Begin-microbiome (B-microbiome)
- LABEL_2 β†’ Inside-microbiome (I-microbiome)
---
## 🧩 Integration with the microbELP Python Package
If you prefer a high-level interface with automatic aggregation, postprocessing, and text-location mapping, you can use the `microbELP` package directly.
Installation:
```bash
git clone https://github.com/omicsNLP/microbELP.git
pip install ./microbELP
```
It is recommended to install in an isolated environment due to dependencies.
Example Usage
```python
from microbELP import microbiome_DL_ner
input_text = "The first microbiome I learned about is called Helicobacter pylori."
print(microbiome_DL_ner(input_text))
```
Output:
```python
[{'Entity': 'Helicobacter pylori', 'locations': {'offset': 47, 'length': 19}}]
```
You can also process a list of texts for batch inference:
```python
input_list = [
"The first microbiome I learned about is called Helicobacter pylori.",
"Then I learned about Eubacterium rectale."
]
print(microbiome_DL_ner(input_list))
```
Output:
```python
[
[{'Entity': 'Helicobacter pylori', 'locations': {'offset': 47, 'length': 19}}],
[{'Entity': 'Eubacterium rectale', 'locations': {'offset': 21, 'length': 19}}]
]
```
Each element in the output corresponds to one input text, containing recognised microbiome entities and their text locations.
There is one optional parameter to this function called `cpu` <type 'bool'>, the default value is False, i.e. runs on a GPU if any are available. If you want to force the usage of the CPU, you will need to use `microbiome_DL_ner(input_list, cpu = True)`.
---
## πŸ“˜ Model Details
Find below some more information about this model.
| Property | Description |
| ----------------- | -------------------------------------- |
| **Task** | Named Entity Recognition (NER) |
| **Domain** | Microbiome / Biomedical Text Mining |
| **Entity Type** | `microbiome` |
| **Model Type** | Transformer-based token classification |
| **Framework** | Hugging Face πŸ€— Transformers |
| **Optimised for** | GPU inference |
---
## πŸ“š Citation
If you find this repository useful, please consider giving a like ❀️ and a citation πŸ“:
```bibtex
@article {Patel2025.08.29.671515,
author = {Patel, Dhylan and Lain, Antoine D. and Vijayaraghavan, Avish and Mirzaei, Nazanin Faghih and Mweetwa, Monica N. and Wang, Meiqi and Beck, Tim and Posma, Joram M.},
title = {Microbial Named Entity Recognition and Normalisation for AI-assisted Literature Review and Meta-Analysis},
elocation-id = {2025.08.29.671515},
year = {2025},
doi = {10.1101/2025.08.29.671515},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/08/30/2025.08.29.671515},
eprint = {https://www.biorxiv.org/content/early/2025/08/30/2025.08.29.671515.full.pdf},
journal = {bioRxiv}
}
```
---
## πŸ”— Resources
Find below some more resources associated with this model.
| Property | Description |
| ----------------- | -------------------------------------- |
| **GitHub Project**|<img src="https://img.shields.io/github/stars/omicsNLP/microbELP.svg?logo=github&label=Stars" style="vertical-align:middle;"/>|
| **Paper** |[![DOI:10.1101/2021.01.08.425887](http://img.shields.io/badge/DOI-10.1101/2025.08.29.671515-BE2536.svg)](https://doi.org/10.1101/2025.08.29.671515)|
| **Data** |[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17305411.svg)](https://doi.org/10.5281/zenodo.17305411)|
| **Codiet** |[![CoDiet](https://img.shields.io/badge/used_by:_%F0%9F%8D%8E_CoDiet-5AA764)](https://www.codiet.eu)|
---
## βš™οΈ License
This model and code are released under the MIT License.