Tagging texts which are longer than 512 tokens.
#4
by
EmilA
- opened
I am using your tool to tag call texts from National Institutes of Health (NIH) with MeSH-terms, e.g., Part II Section I of this https://grants.nih.gov/grants/guide/rfa-files/RFA-RM-09-020.html. I have the issue that the call texts are often longer than the 512 tokens permitted by the model. Is this an issue you have handled yourself somehow, or would you have any idea how to handle it?
Simple truncation is not really an option. I can see that the paper which the model is based on has some ideas on how to concatenate sections of papers to allow for more than 512 tokens, which I however cannot implement, as it would require redeveloping your model https://pubmed.ncbi.nlm.nih.gov/32976559/