PyTorch
Persian
bert
Nasrin-Taghizadeh commited on
Commit
f05b0a1
·
verified ·
1 Parent(s): 4116264

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: lgpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: lgpl-3.0
3
+ language:
4
+ - fa
5
+ base_model:
6
+ - HooshvareLab/bert-base-parsbert-uncased
7
+ ---
8
+ # SINA-BERT: A Pre-trained Language Model for Analysis of Medical Texts in Persian
9
+
10
+ SINA-BERT is the first Persian medical language model pre-trained on BERT (Devlin et al.,2018). SINA-BERT utilizes pre-training on a large-scale corpus of medical contents including formal and informal texts collected from a variety of online resources in order to improve the performance on health-care related tasks.
11
+
12
+
13
+ ## Model Evaluation
14
+ SINA-BERT can be used for any Persian medical representative task. In our paper we have examined the followings:
15
+
16
+ 1) categorization of medical questions,
17
+ 2) medical sentiment analysis,
18
+ 3) and medical question retrieval.
19
+
20
+ For each task, we have developed Persian annotated data sets, and learnt a representation for the data of each task especially complex and long medical questions. With the same architecture being used across tasks, SINA-BERT outperforms BERT-based models that were previously made available in the Persian language.
21
+
22
+ To read about the datasets and results, please refer to SINA-BERT paper: [arXiv:2104.07613v1](https://arxiv.org/pdf/2104.07613)
23
+
24
+
25
+ - **Developed by:** HooshAfzar Salamat Team
26
+ - **Language(s) (NLP):** Persian
27
+ - **Finetuned from model:** [ParsBert](https://huggingface.co/HooshvareLab/bert-base-parsbert-uncased)
28
+
29
+ ### Model Sources [optional]
30
+
31
+ <!-- Provide the basic links for the model. -->
32
+
33
+ - **Repository:** [GitHub](https://github.com/nasrin-taghizadeh/SinaBERT)
34
+ - **Paper [optional]:** [arXive paper](https://arxiv.org/pdf/2104.07613)
35
+
36
+ ## How to use
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ```
41
+ from transformers import AutoConfig, AutoTokenizer, AutoModel
42
+
43
+ config = AutoConfig.from_pretrained("hooshafzar/SINA-BERT")
44
+ tokenizer = AutoTokenizer.from_pretrained("hooshafzar/SINA-BERT")
45
+ model = AutoModel.from_pretrained("hooshafzar/SINA-BERT")
46
+
47
+ ```
48
+
49
+
50
+ ## Citation
51
+
52
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
53
+
54
+
55
+ ```bibtex
56
+ @article{taghizadeh2021sina,
57
+ title={SINA-BERT: a pre-trained language model for analysis of medical texts in Persian},
58
+ author={Taghizadeh, Nasrin and Doostmohammadi, Ehsan and Seifossadat, Elham and Rabiee, Hamid R and Tahaei, Maedeh S},
59
+ journal={arXiv preprint arXiv:2104.07613},
60
+ year={2021}
61
+ }
62
+ ```