fiveflow
/

roberta-base-spacing

Token Classification

Model card Files Files and versions

fiveflow commited on Jul 5, 2023

Commit

26e1084

·

1 Parent(s): 05340ec

Update README.md

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -3,6 +3,28 @@ language:
 - ko
 library_name: transformers
 pipeline_tag: token-classification
-widget:
- - text: "탄소중립과ESG경영에대한사회적요구확대"
----

 - ko
 library_name: transformers
 pipeline_tag: token-classification
+---
+```
+import torch
+org_text = "탄소중립과ESG경영에대한사회적요구확대".replace(" ", "") # 공백제거
+label = ["UNK", "PAD", "O", "B", "I", "E", "S"]
+# char 단위로 토큰화
+token_list = [tokenizer.cls_token_id]
+for char in org_text:
+    token_list.append(tokenizer.encode(char)[1])
+token_list.append(tokenizer.eos_token_id)
+tkd = torch.tensor(token_list).unsqueeze(0)
+output = roberta(tkd).logits
+_, pred_idx = torch.max(output, dim=2)
+tags = [label[idx] for idx in pred_idx.squeeze()][1:-1]
+pred_sent = ""
+for char_idx, spc_idx in enumerate(pred_idx.squeeze()[1:-1]):
+    # "E" tag 단위로 띄어쓰기
+    if label[spc_idx] == "E": pred_sent += org_text[char_idx] + " "
+    else: pred_sent += org_text[char_idx]
+print(pred_sent)
+```