temporary0-0name
/

orator

English

Model card Files Files and versions

xet

Community

temporary0-0name commited on Aug 8, 2024

Commit

ffd4d10

verified ·

1 Parent(s): 26cd2a7

Update README.md

Browse files

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -16,6 +16,8 @@ widget:
 ## Model Description
 This model, designed and pretrained from scratch, was developed without utilizing the Hugging Face library.
 ## Model Parameters
 - **Block Size**: `256` (Maximum sequence length)
 - **Vocab Size**: `50257` (Includes 50,000 BPE merges, 256 byte-level tokens, and 1 special token)
@@ -48,6 +50,8 @@ Non-decayed parameters generally involve biases and layer normalization paramete
 The calculated total number of parameters includes both decayed and non-decayed tensors, summing up to over 95 million parameters.
 ## Dataset Description
 ### Overview
@@ -61,7 +65,32 @@ The dataset is hosted and maintained on Hugging Face's dataset repository. More
 - **Total Tokens Used for Training**: 3 billion tokens
 - **Training Duration**: The model was trained over 3 epochs to ensure sufficient exposure to the data while optimizing the learning trajectory.
 ### Tokenization
 For tokenization, this model uses:
@@ -69,6 +98,8 @@ For tokenization, this model uses:
 tokenizer = tiktoken.get_encoding("gpt2")
 ```
 ## How to Use the Model
 ### Load and Generate Text

 ## Model Description
 This model, designed and pretrained from scratch, was developed without utilizing the Hugging Face library.
+---
 ## Model Parameters
 - **Block Size**: `256` (Maximum sequence length)
 - **Vocab Size**: `50257` (Includes 50,000 BPE merges, 256 byte-level tokens, and 1 special token)
 The calculated total number of parameters includes both decayed and non-decayed tensors, summing up to over 95 million parameters.
+---
 ## Dataset Description
 ### Overview
 - **Total Tokens Used for Training**: 3 billion tokens
 - **Training Duration**: The model was trained over 3 epochs to ensure sufficient exposure to the data while optimizing the learning trajectory.
+---
+## Model Evaluation on HellaSwag Dataset
+### Performance Overview
+The evaluation of our model, "orator," on the HellaSwag dataset demonstrates significant progress in understanding context-based predictions. Below, we detail the performance through loss and accuracy graphs, accompanied by specific metrics.
+### Graph Analysis
+#### Loss Graph
+![Loss Graph](output1.png)
+- **Blue Line (Train Loss)**: Represents the model's loss on the training set over the number of training steps. It shows a sharp decline initially, indicating rapid learning, followed by fluctuations that gradually stabilize.
+- **Orange Line (Validation Loss)**: Shows the loss on the validation set. This line is smoother than the training loss, indicating general stability and effectiveness of the model against unseen data.
+- **Red Dashed Line**: Marks the validation loss of a baseline model, OpenAI's GPT-2 (124M), for comparison. Our model achieves lower validation loss, indicating improved performance.
+#### Accuracy Graph (HellaSwag Eval)
+![Accuracy Graph](output2.png)
+- **Blue Line**: This line represents the accuracy of the "orator" model on the HellaSwag evaluation set. It shows a steady increase in accuracy, reflecting the model's improving capability to correctly predict or complete new scenarios.
+- **Red Dashed Line**: This is the accuracy of the baseline OpenAI GPT-2 (124M) model. Our model consistently surpasses this benchmark after initial training phases.
+### Key Metrics
+- **Minimum Training Loss**: `2.883471`
+- **Minimum Validation Loss**: `3.1989`
+- **Maximum HellaSwag Evaluation Accuracy**: `0.3054`
+---
 ### Tokenization
 For tokenization, this model uses:
 tokenizer = tiktoken.get_encoding("gpt2")
 ```
+---
 ## How to Use the Model
 ### Load and Generate Text