Antoinelfr commited on
Commit
b736a18
Β·
verified Β·
1 Parent(s): 7e0659c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -1
README.md CHANGED
@@ -5,4 +5,151 @@ language:
5
  base_model:
6
  - dmis-lab/biobert-base-cased-v1.1
7
  pipeline_tag: token-classification
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - dmis-lab/biobert-base-cased-v1.1
7
  pipeline_tag: token-classification
8
+ ---
9
+ [![Paper](https://img.shields.io/badge/Paper-View%20on%20bioRxiv-orange?logo=biorxiv&logoColor=white)](https://www.biorxiv.org/content/10.1101/2025.08.29.671515v1)
10
+ [![GitHub](https://img.shields.io/badge/GitHub-omicsNLP%2FmicrobELP-blue?logo=github)](https://github.com/omicsNLP/microbELP)
11
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/omicsNLP/microbELP/blob/main/LICENSE)
12
+
13
+ # 🦠 MicrobELP β€” Microbiome Entity Recognition
14
+
15
+ MicrobELP is a deep learning model for Microbiome Entity Recognition, identifying microbial entities (bacteria, archaea, fungi) in biomedical and scientific text.
16
+ It is part of the [microbELP](https://github.com/omicsNLP/microbELP) toolkit and has been optimised for GPU inference.
17
+
18
+ This model enables automated extraction of microbiome names from unstructured text, facilitating microbiome-related text mining and literature curation.
19
+
20
+ ---
21
+
22
+ ## πŸš€ Quick Start (Hugging Face)
23
+
24
+ You can directly load and run the model with the Hugging Face `transformers` pipeline:
25
+
26
+ ```python
27
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
28
+
29
+ tokenizer = AutoTokenizer.from_pretrained("omicsNLP/microbELP_NER")
30
+ model = AutoModelForTokenClassification.from_pretrained("omicsNLP/microbELP_NER")
31
+
32
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer)
33
+
34
+ example = "The first microbiome I learned about is called Helicobacter pylori."
35
+ ner_results = nlp(example)
36
+
37
+ print(ner_results)
38
+ ```
39
+
40
+ Output:
41
+
42
+ ```
43
+ [
44
+ {'entity': 'LABEL_0', 'score': 0.9954, 'index': 1, 'word': 'the', 'start': 0, 'end': 3},
45
+ ...
46
+ {'entity': 'LABEL_1', 'score': 0.9889, 'index': 11, 'word': 'he', 'start': 47, 'end': 49},
47
+ {'entity': 'LABEL_2', 'score': 0.9710, 'index': 16, 'word': 'p', 'start': 60, 'end': 61},
48
+ ...
49
+ ]
50
+ ```
51
+
52
+ where:
53
+ - LABEL_0 β†’ Outside (O)
54
+ - LABEL_1 β†’ Begin-microbiome (B-microbiome)
55
+ - LABEL_2 β†’ Inside-microbiome (I-microbiome)
56
+
57
+ ---
58
+
59
+ ## 🧩 Integration with the microbELP Python Package
60
+
61
+ If you prefer a high-level interface with automatic aggregation, postprocessing, and text-location mapping, you can use the `microbELP` package directly.
62
+
63
+ Installation:
64
+ ```bash
65
+ git clone https://github.com/omicsNLP/microbELP.git
66
+ pip install ./microbELP
67
+ ```
68
+
69
+ It is recommended to install in an isolated environment due to dependencies.
70
+
71
+ Example Usage (GPU model)
72
+
73
+ ```python
74
+ from microbELP import microbiome_DL_ner
75
+
76
+ input_text = "The first microbiome I learned about is called Helicobacter pylori."
77
+ print(microbiome_DL_ner(input_text))
78
+ ```
79
+
80
+ Output:
81
+
82
+ ```python
83
+ [{'Entity': 'Helicobacter pylori', 'locations': {'offset': 47, 'length': 19}}]
84
+ ```
85
+
86
+ You can also process a list of texts for batch inference:
87
+
88
+ ```python
89
+ input_list = [
90
+ "The first microbiome I learned about is called Helicobacter pylori.",
91
+ "Then I learned about Eubacterium rectale."
92
+ ]
93
+ print(microbiome_DL_ner(input_list))
94
+ ```
95
+
96
+ Output:
97
+
98
+ ```python
99
+ [
100
+ [{'Entity': 'Helicobacter pylori', 'locations': {'offset': 47, 'length': 19}}],
101
+ [{'Entity': 'Eubacterium rectale', 'locations': {'offset': 21, 'length': 19}}]
102
+ ]
103
+ ```
104
+ Each element in the output corresponds to one input text, containing recognised microbiome entities and their text locations.
105
+
106
+ ---
107
+
108
+ ## πŸ“˜ Model Details
109
+
110
+ | Property | Description |
111
+ | ----------------- | -------------------------------------- |
112
+ | **Task** | Named Entity Recognition (NER) |
113
+ | **Domain** | Microbiome / Biomedical Text Mining |
114
+ | **Entity Type** | `microbiome` |
115
+ | **Model Type** | Transformer-based token classification |
116
+ | **Framework** | Hugging Face πŸ€— Transformers |
117
+ | **Optimised for** | GPU inference |
118
+
119
+
120
+ ---
121
+
122
+ ## πŸ“š Citation
123
+
124
+ If you find this repository useful, please consider giving a star ⭐ and citation πŸ“:
125
+
126
+ ```bibtex
127
+ @article {Patel2025.08.29.671515,
128
+ author = {Patel, Dhylan and Lain, Antoine D. and Vijayaraghavan, Avish and Mirzaei, Nazanin Faghih and Mweetwa, Monica N. and Wang, Meiqi and Beck, Tim and Posma, Joram M.},
129
+ title = {Microbial Named Entity Recognition and Normalisation for AI-assisted Literature Review and Meta-Analysis},
130
+ elocation-id = {2025.08.29.671515},
131
+ year = {2025},
132
+ doi = {10.1101/2025.08.29.671515},
133
+ publisher = {Cold Spring Harbor Laboratory},
134
+ URL = {https://www.biorxiv.org/content/early/2025/08/30/2025.08.29.671515},
135
+ eprint = {https://www.biorxiv.org/content/early/2025/08/30/2025.08.29.671515.full.pdf},
136
+ journal = {bioRxiv}
137
+ }
138
+ ```
139
+
140
+ ---
141
+
142
+ ## πŸ”— Resources
143
+
144
+ | Property | Description |
145
+ | ----------------- | -------------------------------------- |
146
+ | **GitHub Project**|<img src="https://img.shields.io/github/stars/omicsNLP/microbELP.svg?logo=github&label=Stars" style="vertical-align:middle;"/>|
147
+ | **Paper** |[![DOI:10.1101/2021.01.08.425887](http://img.shields.io/badge/DOI-10.1101/2025.08.29.671515-BE2536.svg)](https://doi.org/10.1101/2025.08.29.671515)|
148
+ | **Data** |[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17305411.svg)](https://doi.org/10.5281/zenodo.17305411)|
149
+ | **Codiet** |[![CoDiet](https://img.shields.io/badge/used_by:_%F0%9F%8D%8E_CoDiet-5AA764)](https://www.codiet.eu)|
150
+
151
+ ---
152
+
153
+ ## βš™οΈ License
154
+
155
+ This model and code are released under the MIT License.