| | --- |
| | license: mit |
| | datasets: |
| | - bigbio/chemdner |
| | - ncbi_disease |
| | - jnlpba |
| | - bigbio/n2c2_2018_track2 |
| | - bigbio/bc5cdr |
| | widget: |
| | - text: Drug<SEP>He was given aspirin and paracetamol. |
| | language: |
| | - en |
| | metrics: |
| | - precision |
| | - recall |
| | - f1 |
| | pipeline_tag: token-classification |
| | tags: |
| | - token-classification |
| | - biology |
| | - medical |
| | - zero-shot |
| | - few-shot |
| | library_name: transformers |
| | --- |
| | # Zero and few shot NER for biomedical texts |
| |
|
| | ## Model description |
| | Model takes as input two strings. String1 is NER label. String1 must be phrase for entity. String2 is short text where String1 is searched for semantically. |
| | model outputs list of zeros and ones corresponding to the occurance of Named Entity and corresponing to the tokens(tokens given by transformer tokenizer) of the Sring2. |
| |
|
| | ## Example of usage |
| | ```python |
| | from transformers import AutoTokenizer |
| | from transformers import BertForTokenClassification |
| | |
| | modelname = 'ProdicusII/ZeroShotBioNER' # modelpath |
| | tokenizer = AutoTokenizer.from_pretrained(modelname) ## loading the tokenizer of that model |
| | string1 = 'Drug' |
| | string2 = 'No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.' |
| | encodings = tokenizer(string1, string2, is_split_into_words=False, |
| | padding=True, truncation=True, add_special_tokens=True, return_offsets_mapping=False, |
| | max_length=512, return_tensors='pt') |
| | |
| | model = BertForTokenClassification.from_pretrained(modelname, num_labels=2) |
| | prediction_logits = model(**encodings) |
| | print(prediction_logits) |
| | ``` |
| |
|
| | ## Available classes |
| |
|
| | The following datasets and entities were used for training and therefore they can be used as label in the first segment (as a first string). Note that multiword string have been merged. |
| |
|
| |
|
| | * NCBI |
| | * Specific Disease |
| | * Composite Mention |
| | * Modifier |
| | * Disease Class |
| | * BIORED |
| | * Sequence Variant |
| | * Gene Or Gene Product |
| | * Disease Or Phenotypic Feature |
| | * Chemical Entity |
| | * Cell Line |
| | * Organism Taxon |
| | * CDR Disease |
| | * Chemical |
| | * CHEMDNER |
| | * Chemical |
| | * Chemical Family |
| | * JNLPBA |
| | * Protein |
| | * DNA |
| | * Cell Type |
| | * Cell Line |
| | * RNA |
| | * n2c2 |
| | * Drug |
| | * Frequency |
| | * Strength |
| | * Dosage |
| | * Form |
| | * Reason |
| | * Route |
| | * ADE |
| | * Duration |
| |
|
| | On top of this, one can use the model in zero-shot regime with other classes, and also fine-tune it with few examples of other classes. |
| |
|
| |
|
| |
|
| | ## Code availibility |
| |
|
| | Code used for training and testing the model is available at https://github.com/br-ai-ns-institute/Zero-ShotNER |
| |
|
| | ## Citation |