biodatlab
/

MIReAD-Neuro

Text Classification

Model card Files Files and versions

atrytone commited on Jun 9, 2023

Commit

fb0085b

·

1 Parent(s): 25e6d76

Create README.md

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+language:
+- en
+pipeline_tag: text-classification
+metrics:
+- f1
+- accuracy
+- recall
+- precision
+library_name: transformers
+---
+This model is a fine-tuned version of [arazd/MIReAD](https://huggingface.co/arazd/MIReAD) on a dataset of Neuroscience papers from 200 journals collected from various sources.
+It achieves the following results on the evaluation set:
+- Loss: 2.7117
+- Accuracy: 0.4011
+- F1: 0.3962
+- Precision: 0.4066
+- Recall: 0.3999
+## Model description
+This model was trained on a journal classification task.
+## Intended uses & limitations
+The intended use of this model is to create abstract embeddings for semantic similarity search.
+## Model Usage
+To load the model:
+```py
+from transformers import BertForSequenceClassification, AutoTokenizer
+mpath = 'biodatlab/MIReAD-Neuro'
+model = BertForSequenceClassification.from_pretrained(mpath)
+tokenizer = AutoTokenizer.from_pretrained(mpath)
+```
+To create embeddings:
+```py
+# sample abstract & title text
+title = 'MIReAD: simple method for learning scientific representations'
+abstr = 'Learning semantically meaningful representations from scientific documents can ...'
+text = title + tokenizer.sep_token + abstr
+tokens = tokenizer(sents,
+                   max_length=512,
+                   padding=True,
+                   truncation=True,
+                   return_tensors="pt"
+                  )
+with torch.no_grad():
+  out = model.bert(**tokens)
+  feature = out.last_hidden_state[:, 0, :]
+```
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- train_batch_size: 16
+- eval_batch_size: 16
+- num_epochs: 6