MarioBarbeque commited on
Commit
efddbd5
·
verified ·
1 Parent(s): 90814e9

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -5
README.md CHANGED
@@ -110,6 +110,7 @@ The model was trained locally on a single-node with one 16GB Nvidia T4 using
110
  ## Evaluation / Metrics
111
 
112
  We evaluate our masked language model's performance using the `perplexity` metric, which has a few mathematical defitions. We define the perplexity as the exponential of the cross-entropy.
 
113
  See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
114
 
115
  Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
@@ -120,11 +121,8 @@ Perplexity: [https://en.wikipedia.org/wiki/Perplexity](https://en.wikipedia.org/
120
 
121
  #### Testing Data
122
 
123
- The IMDB dataset from Stanford NLP comes pre-split into training and testing data of 25k reviews each. Our preprocessing
124
-
125
- We configured a train/test split using the standard 80/20 rule of thumb on the shuffled UC Irvine data set. The dataset [model card](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews)
126
- contains in its base form a `DataDict` with splits for train, validation, and test. The dataset used for testing can be found there in the test split.
127
-
128
 
129
  ### Results
130
 
 
110
  ## Evaluation / Metrics
111
 
112
  We evaluate our masked language model's performance using the `perplexity` metric, which has a few mathematical defitions. We define the perplexity as the exponential of the cross-entropy.
113
+ To remove randomness in our metrics, we premask our evaluation dataset with a single masking function. This ensures we are evaluating with respect to the same set of labels each epoch.
114
  See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
115
 
116
  Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
 
121
 
122
  #### Testing Data
123
 
124
+ The IMDB dataset from Stanford NLP comes pre-split into training and testing data of 25k reviews each. Our preprocessing, which included the chunking of concatenated, tokenized inputs
125
+ into chunks of 256 tokens, increased these respective splits by approximately ~5k records each. We apply a single masking function to the evluation dataset before testinf as mentioned above.
 
 
 
126
 
127
  ### Results
128