MarioBarbeque
/

DistilBERT-DeNiro

Model card Files Files and versions

MarioBarbeque commited on Nov 23, 2024

Commit

efddbd5

·

verified ·

1 Parent(s): 90814e9

update readme

Files changed (1) hide show

README.md +3 -5

README.md CHANGED Viewed

@@ -110,6 +110,7 @@ The model was trained locally on a single-node with one 16GB Nvidia T4 using
 ## Evaluation / Metrics
 We evaluate our masked language model's performance using the `perplexity` metric, which has a few mathematical defitions. We define the perplexity as the exponential of the cross-entropy.
 See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
 Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
@@ -120,11 +121,8 @@ Perplexity: [https://en.wikipedia.org/wiki/Perplexity](https://en.wikipedia.org/
 #### Testing Data
-The IMDB dataset from Stanford NLP comes pre-split into training and testing data of 25k reviews each. Our preprocessing
-We configured a train/test split using the standard 80/20 rule of thumb on the shuffled UC Irvine data set. The dataset [model card](https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews)
-contains in its base form a `DataDict` with splits for train, validation, and test. The dataset used for testing can be found there in the test split.
 ### Results

 ## Evaluation / Metrics
 We evaluate our masked language model's performance using the `perplexity` metric, which has a few mathematical defitions. We define the perplexity as the exponential of the cross-entropy.
+To remove randomness in our metrics, we premask our evaluation dataset with a single masking function. This ensures we are evaluating with respect to the same set of labels each epoch.
 See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
 Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
 #### Testing Data
+The IMDB dataset from Stanford NLP comes pre-split into training and testing data of 25k reviews each. Our preprocessing, which included the chunking of concatenated, tokenized inputs
+into chunks of 256 tokens, increased these respective splits by approximately ~5k records each. We apply a single masking function to the evluation dataset before testinf as mentioned above.
 ### Results