update readme
Browse files
README.md
CHANGED
|
@@ -110,6 +110,7 @@ The model was trained locally on a single-node with one 16GB Nvidia T4 using
|
|
| 110 |
## Evaluation / Metrics
|
| 111 |
|
| 112 |
We evaluate our masked language model's performance using the `perplexity` metric, which has a few mathematical defitions. We define the perplexity as the exponential of the cross-entropy.
|
|
|
|
| 113 |
See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
|
| 114 |
|
| 115 |
Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
|
|
@@ -120,11 +121,8 @@ Perplexity: [https://en.wikipedia.org/wiki/Perplexity](https://en.wikipedia.org/
|
|
| 120 |
|
| 121 |
#### Testing Data
|
| 122 |
|
| 123 |
-
The IMDB dataset from Stanford NLP comes pre-split into training and testing data of 25k reviews each. Our preprocessing
|
| 124 |
-
|
| 125 |
-
We configured a train/test split using the standard 80/20 rule of thumb on the shuffled UC Irvine data set. The dataset [model card](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews)
|
| 126 |
-
contains in its base form a `DataDict` with splits for train, validation, and test. The dataset used for testing can be found there in the test split.
|
| 127 |
-
|
| 128 |
|
| 129 |
### Results
|
| 130 |
|
|
|
|
| 110 |
## Evaluation / Metrics
|
| 111 |
|
| 112 |
We evaluate our masked language model's performance using the `perplexity` metric, which has a few mathematical defitions. We define the perplexity as the exponential of the cross-entropy.
|
| 113 |
+
To remove randomness in our metrics, we premask our evaluation dataset with a single masking function. This ensures we are evaluating with respect to the same set of labels each epoch.
|
| 114 |
See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
|
| 115 |
|
| 116 |
Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
|
|
|
|
| 121 |
|
| 122 |
#### Testing Data
|
| 123 |
|
| 124 |
+
The IMDB dataset from Stanford NLP comes pre-split into training and testing data of 25k reviews each. Our preprocessing, which included the chunking of concatenated, tokenized inputs
|
| 125 |
+
into chunks of 256 tokens, increased these respective splits by approximately ~5k records each. We apply a single masking function to the evluation dataset before testinf as mentioned above.
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
### Results
|
| 128 |
|