File size: 1,745 Bytes

0a6043c
 
 
 
 
 
 
 
c3f1ae9
 
8d2fb61
29ecef5
dc4ee9f
29ecef5
dc4ee9f
29ecef5
dc4ee9f
29ecef5
dc4ee9f
c3f1ae9
dc4ee9f
c3f1ae9
b9e455c
c3f1ae9
b9e455c
 
 
c3f1ae9
b9e455c
 
c3f1ae9
b9e455c
 
c3f1ae9
b9e455c
 
c3f1ae9
b9e455c
 
 
c3f1ae9
b9e455c
 
c3f1ae9
b9e455c
1a9f1d1
c3f1ae9
1a9f1d1
 
8d2fb61
 
1a9f1d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34d8de3

---
datasets:
- pubmed
language:
- en
tags:
- BERT
---
# Model Card for Model ID

base_model : [dmis-lab/biobert-v1.1](https://huggingface.co/dmis-lab/biobert-v1.1)

hidden_size : 768

max_position_embeddings : 512

num_attention_heads : 12

num_hidden_layers : 12

vocab_size : 28996

# Basic usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import numpy as np

# match tag
id2tag = {0:'O', 1:'B_MT', 2:'I_MT'}

# load model & tokenizer
MODEL_NAME = 'MDDDDR/dmis_lab_biobert_v1.1_NER'

model = AutoModelForTokenClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# prepare input
text = 'mental disorder can also contribute to the development of diabetes through various mechanism including increased stress, poor self care behavior, and adverse effect on glucose metabolism.'
tokenized = tokenizer(text, return_tensors='pt')

# forward pass
output = model(**tokenized)

# result
preds = np.argmax(output[0].cpu().detach().numpy(), axis=2)[0][1:-1]

# check preds
for txt, pred in zip(tokenizer.tokenize(text), preds):
    print("{}\t{}".format(id2tag[pred], txt))
    # B_MT mental 
    # B_MT disorder
    # O	can
    # O	also
    # O	contribute
    # O	to
    # O	the
    # B_MT	development
    # O	of
    # B_MT	diabetes
    # O	through
    # O	various
    # B_MT	mechanism
    # O	including
    # O	increased
    # B_MT	stress
    # O	,
    # O	poor
    # B_MT	self
    # B_MT	care
    # B_MT	behavior
    # O	,
    # O	and
    # B_MT	adverse
    # I_MT	effect
    # O	on
    # B_MT	glucose
    # B_MT	metabolism
    # O	.
```

## Framework versions
- transformers : 4.39.1
- torch : 2.1.0+cu121
- datasets : 2.18.0
- tokenizers : 0.15.2
- numpy : 1.20.0