Instructions to use Wellcome/WellcomeBertMesh with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Wellcome/WellcomeBertMesh with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Wellcome/WellcomeBertMesh", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Wellcome/WellcomeBertMesh", trust_remote_code=True) model = AutoModel.from_pretrained("Wellcome/WellcomeBertMesh", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Tagging texts which are longer than 512 tokens.
I am using your tool to tag call texts from National Institutes of Health (NIH) with MeSH-terms, e.g., Part II Section I of this https://grants.nih.gov/grants/guide/rfa-files/RFA-RM-09-020.html. I have the issue that the call texts are often longer than the 512 tokens permitted by the model. Is this an issue you have handled yourself somehow, or would you have any idea how to handle it?
Simple truncation is not really an option. I can see that the paper which the model is based on has some ideas on how to concatenate sections of papers to allow for more than 512 tokens, which I however cannot implement, as it would require redeveloping your model https://pubmed.ncbi.nlm.nih.gov/32976559/