Instructions to use Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier") model = AutoModelForSequenceClassification.from_pretrained("Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier") - Notebooks
- Google Colab
- Kaggle
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier")# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier")
model = AutoModelForSequenceClassification.from_pretrained("Timofey/PubMedBERT_Drugs_Metabolites_Context_Classifier")This model is a fine-tuned model of BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext (hugging-face card). The current model was developed for the web-based ANDDigest system for the classification of the short names of drugs and metabolites in texts on the basis of their context (the name considered to be short if it's length is 4 symbols or less). The analyzed name should be replaced in text with tag.
Input:
Any biomedical text where a name of classified object is replaced with tag, for example, this pubmed abstract:
Intermittent obstruction of jejunostomy tube due to Ascaris lumbricoides infection. A 45-year-old Costa Rican woman was seen for a jejunostomy tube malfunction. There was no evidence of tube malposition or intestinal obstruction. During endoscopy, a long worm was retrieved from the distal duodenum; it was later confirmed to be Ascaris lumbricoides. After treatment with <andsystem-candidate>, no further episodes of tube occlusion were observed. This case reminds us of the importance of considering helminthic infections and their atypical manifestations in patients from endemic regions.
In this example mebendazole was replaced with <andsystem-candidate>. Please keep in mind that maximum length of input sequence for BERT is limited to 512 tokens.
Output:
LABEL_0 refers to the probability of the FALSE recognition, i.e. if the context of <andsystem-candidate> doesn't corresponds to the context specific for drugs or metabolites.
LABEL_1 refers to the probability of the TRUE recognition, i.e. when the context of <andsystem-candidate> corresponds to the context specific for drugs or metabolites.
The optimal threshold value for the short names of drugs or metabolites for the LABEL_1, was calculated using a gold standard (add link). It is >= 0.999992847442627.
The Mathew Correlation Coefficient of the model for the long names (>= 15 symbols) is 0.983.
The ROC AUC value of the model, calculated for the short names (<= 4 symbols) is 0.907.
Citing
If you found the developed models to be useful in your research, please cite the following articles:
Ivanisenko, T.V., Saik, O.V., Demenkov, P.S. et al. ANDDigest: a new web-based module of ANDSystem for the search of knowledge in the scientific literature. BMC Bioinformatics 21 (Suppl 11), 228 (2020). https://doi.org/10.1186/s12859-020-03557-8
Ivanisenko, T.V.; Demenkov, P.S.; Kolchanov, N.A.; Ivanisenko, V.A. The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. Int. J. Mol. Sci. 2022, 23, 14934. https://doi.org/10.3390/ijms232314934
- Downloads last month
- -
# Gated model: Login with a HF token with gated access permission hf auth login