Lihuchen
/

AcroBERT

 tags:
 - acronym disambiguation
 - acronym linking
+---
+AcroBERT can do end-to-end acronym linking. Given a sentence, our framework first recognize acronyms by using [MadDog](https://github.com/amirveyseh/MadDog), and then disambiguate them by using AcroBERT:
+```python
+from inference.acrobert important acronym_linker
+# input sentence with acronyms, the maximum length is 400 sub-tokens
+sentence = "This new genome assembly and the annotation are tagged as a RefSeq genome by NCBI."
+# mode = ['acrobert', 'pop']
+# AcroBERT has a better performance while the pop method is faster but with a low accuracy.
+results = acronym_linker(sentence, mode='acrobert')
+print(results)
+## expected output: [('NCBI', 'National Center for Biotechnology Information')]
+```
+Github: [https://github.com/tigerchen52/GLADIS](https://github.com/tigerchen52/GLADIS)
+Model: [https://zenodo.org/record/7568937#.Y9vtrXaZMuU]
+Apart from the AcroBERT, we constructed a new benchmark named GLADIS for accelerating the research on acronym disambiguation, which contains the below data:
+|  | Source  | Desc |
+|------|------------|------|
+| [Acronym Dictionary](https://zenodo.org/record/7568937#.Y9JiQXaZNPY) | [Pile](https://github.com/EleutherAI/the-pile) (MIT license), [Wikidata](https://www.wikidata.org/wiki/Help:Aliases), [UMLS](https://www.nlm.nih.gov/research/umls/index.html) |1.6 million acronyms and 6.4 million long forms|
+| [Three Datasets](https://zenodo.org/record/7568937#.Y9JiQXaZNPY) | [WikilinksNED Unseen](https://github.com/yasumasaonoe/ET4EL), [SciAD](https://github.com/amirveyseh/AAAI-21-SDU-shared-task-2-AD)(CC BY-NC-SA 4.0), [Medmentions](https://github.com/chanzuckerberg/MedMentions)(CC0 1.0)|three AD datasets that cover general, scientific, biomedical domains |
+| [A Pre-training Corpus](https://zenodo.org/record/7568937#.Y9JiQXaZNPY) | [Pile](https://github.com/EleutherAI/the-pile) (MIT license) | 180 million sentences with acronyms|
+## usage
+1. git clone https://github.com/tigerchen52/GLADIS.git
+2. download the [acronym dictionary](https://zenodo.org/record/7568937#.Y9JiQXaZNPY) and [AcroBERT]((https://zenodo.org/record/7568937#.Y9JiQXaZNPY)), and put them into this path: `inpu
+3. use the function inference.acrobert.acronym_linker() to do end-to-end acronym linking.
+## citation
+```
+Lihu Chen, Gaël Varoquaux, & Fabian Suchanek. (2023, May 2).
+GLADIS: A General and Large Acronym Disambiguation Benchmark.
+The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL).
+```