atrytone commited on
Commit
fb0085b
·
1 Parent(s): 25e6d76

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-classification
5
+ metrics:
6
+ - f1
7
+ - accuracy
8
+ - recall
9
+ - precision
10
+ library_name: transformers
11
+ ---
12
+
13
+ This model is a fine-tuned version of [arazd/MIReAD](https://huggingface.co/arazd/MIReAD) on a dataset of Neuroscience papers from 200 journals collected from various sources.
14
+ It achieves the following results on the evaluation set:
15
+ - Loss: 2.7117
16
+ - Accuracy: 0.4011
17
+ - F1: 0.3962
18
+ - Precision: 0.4066
19
+ - Recall: 0.3999
20
+
21
+
22
+ ## Model description
23
+
24
+ This model was trained on a journal classification task.
25
+
26
+ ## Intended uses & limitations
27
+
28
+ The intended use of this model is to create abstract embeddings for semantic similarity search.
29
+
30
+ ## Model Usage
31
+
32
+ To load the model:
33
+
34
+ ```py
35
+ from transformers import BertForSequenceClassification, AutoTokenizer
36
+ mpath = 'biodatlab/MIReAD-Neuro'
37
+ model = BertForSequenceClassification.from_pretrained(mpath)
38
+ tokenizer = AutoTokenizer.from_pretrained(mpath)
39
+ ```
40
+
41
+ To create embeddings:
42
+
43
+ ```py
44
+ # sample abstract & title text
45
+ title = 'MIReAD: simple method for learning scientific representations'
46
+ abstr = 'Learning semantically meaningful representations from scientific documents can ...'
47
+ text = title + tokenizer.sep_token + abstr
48
+ tokens = tokenizer(sents,
49
+ max_length=512,
50
+ padding=True,
51
+ truncation=True,
52
+ return_tensors="pt"
53
+ )
54
+ with torch.no_grad():
55
+ out = model.bert(**tokens)
56
+ feature = out.last_hidden_state[:, 0, :]
57
+ ```
58
+
59
+ ## Training procedure
60
+
61
+ ### Training hyperparameters
62
+
63
+ The following hyperparameters were used during training:
64
+ - learning_rate: 3e-05
65
+ - train_batch_size: 16
66
+ - eval_batch_size: 16
67
+ - num_epochs: 6