aakorolyova
/

primary_outcome_extraction

@@ -1,47 +1,69 @@
----
-tags:
-- generated_from_keras_callback
-model-index:
-- name: primary_outcome_extraction
-  results: []
----
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
-# primary_outcome_extraction
-This model was trained from scratch on an unknown dataset.
-It achieves the following results on the evaluation set:
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- optimizer: None
-- training_precision: float32
-### Training results
-### Framework versions
-- Transformers 4.16.1
-- TensorFlow 2.5.0
-- Datasets 1.10.2
-- Tokenizers 0.12.1

+<h1>Model description</h1>
+This is a fine-tuned BioBERT model for extracting primary outcomes from articles reporting clinical trials.
+This is the second version of the model; the original model development was reported in:
+Anna Koroleva, Sanjay Kamath, Patrick Paroubek. Extracting primary and reported outcomes from articles reporting randomized controlled trials using pre-trained deep language representations. Preprint: https://easychair.org/publications/preprint/qpml
+The original work was conducted within the scope of the Assisted authoring for avoiding inadequate claims in scientific reporting PhD project of the Methods for Research on Research (MiRoR, http://miror-ejd.eu/) program.
+Model creator: Anna Koroleva
+<h1>Intended uses & limitations</h1>
+The model is intended to be used for extracting primary outcomes from texts of clinical trials.
+The main limitation is that the model was trained on a fairly small (2000 sentences) sample of data annotated by a single annotator. Annotating more data or involvig more annotators was not possiblw within the PhD project.
+Another possible issue with the model use if the complex nature of outcomes: a typical description of an outcome can include the outcome name, measurement tool, timepoints, e.g. "Health-Related Quality of Life at 12 months, measured using the Assessment of Quality of Life instrument". Ideally, this should be broken into 3 separate entities ("Health-Related Quality of Life" - outcome", "at 12 months" - timepoint", "the Assessment of Quality of Life instrument" - measurement tool), and relation between the three should be extracted to capture all the outcome-related information. However, in our annotation we annotated this type of examples as a sinale outcome entity.
+<h1>How to use</h1>
+The model should be used with the BioBERT tokeniser. A sample code for getting model predictions is below:
+```
+  import numpy as np
+  from transformers import AutoTokenizer
+  from transformers import AutoModelForTokenClassification
+  from transformers import AutoModelForSequenceClassification
+  tokenizer = AutoTokenizer.from_pretrained('dmis-lab/biobert-v1.1')
+  model = AutoModelForTokenClassification.from_pretrained(r'aakorolyova/primary_outcome_extraction')
+  text = 'Primary endpoints were overall survival in patients with oesophageal squamous cell carcinoma and PD-L1 combined positive score (CPS) of 10 or more, and overall survival and progression-free survival in patients with oesophageal squamous cell carcinoma, PD-L1 CPS of 10 or more, and in all randomised patients.'
+  encoded_input = tokenizer(text, padding=True, truncation=True, max_length=2000, return_tensors='pt')
+  output = model(**encoded_input)['logits']
+  output = np.argmax(output.detach().numpy(), axis=2)
+  print(output)
+```
+Some more useful functions can be found in or Github repository: https://github.com/aakorolyova/DeSpin-2.0
+<h1>Training data</h1>
+Training data can be found in https://github.com/aakorolyova/DeSpin-2.0/tree/main/data/Primary_Outcomes
+<h1>Training procedure</h1>
+The model was fine-tuned using Huggingface Trainer API. Training scripts can be found in https://github.com/aakorolyova/DeSpin-2.0
+<h1>Evaluation</h1>
+Precision: 74.41%
+Recall: 88.7%
+F1: 80.93%