Update README.md

Browse files

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -52,6 +52,30 @@ StopAskingQuestionsMini uses a scaled down version of the [Qwen3](https://arxiv.
 | Tie Word Embeddings | True |
 | Vocab Size | 1024 |
 ## Benchmarks
 We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
@@ -66,6 +90,35 @@ We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a qu
 Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
 Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
 ## Use Cases
 Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas:

 | Tie Word Embeddings | True |
 | Vocab Size | 1024 |
+## Training
+StopAsingQuestionsMini trained on 23 million tokens of questions for two epochs with a batch size of 16.
+### Training Results
+| Epoch | Train Loss | Eval Loss | Train PPL | Eval PPL |
+|-------|------------|-----------|-----------|----------|
+| 0.07  | 4.0797     | 3.0011    | 59.05     | 20.11    |
+| 0.22  | 2.6331     | 2.5703    | 13.92     | 13.07    |
+| 0.37  | 2.4906     | 2.4586    | 12.07     | 11.68    |
+| 0.52  | 2.4213     | 2.3989    | 11.26     | 11.01    |
+| 0.66  | 2.3700     | 2.3552    | 10.70     | 10.54    |
+| 0.81  | 2.3375     | 2.3242    | 10.35     | 10.22    |
+| 0.96  | 2.3094     | 2.2949    | 10.07     | 9.92     |
+| 1.11  | 2.2720     | 2.2746    | 9.70      | 9.72     |
+| 1.26  | 2.2527     | 2.2533    | 9.51      | 9.52     |
+| 1.40  | 2.2345     | 2.2367    | 9.34      | 9.36     |
+| 1.55  | 2.2239     | 2.2212    | 9.24      | 9.22     |
+| 1.70  | 2.2043     | 2.2044    | 9.06      | 9.06     |
+| 1.85  | 2.1885     | 2.1930    | 8.92      | 8.96     |
+| 1.99  | 2.1843     | 2.1854    | 8.88      | 8.90     |
 ## Benchmarks
 We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
 Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
 Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
+## Generations
+Prompt: **`Question:`**
+Generation1:
+```text
+what legal reforms faced rafer leadership during ww1?
+```
+Generation2:
+```text
+How many emissions should a frather?
+```
+Generation3:
+```text
+What do foreigners do?
+```
+Generation4:
+```text
+What is the best appropriate way to learn Japanese?
+```
+Generation5:
+```text
+How much is the MDU and JavaScript to the new UK?
+```
 ## Use Cases
 Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas: