Harley-ml commited on
Commit
b177f39
·
verified ·
1 Parent(s): 6f71091

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -52,6 +52,30 @@ StopAskingQuestionsMini uses a scaled down version of the [Qwen3](https://arxiv.
52
  | Tie Word Embeddings | True |
53
  | Vocab Size | 1024 |
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ## Benchmarks
56
 
57
  We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
@@ -66,6 +90,35 @@ We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a qu
66
  Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
67
  Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  ## Use Cases
70
 
71
  Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas:
 
52
  | Tie Word Embeddings | True |
53
  | Vocab Size | 1024 |
54
 
55
+ ## Training
56
+
57
+ StopAsingQuestionsMini trained on 23 million tokens of questions for two epochs with a batch size of 16.
58
+
59
+ ### Training Results
60
+
61
+
62
+ | Epoch | Train Loss | Eval Loss | Train PPL | Eval PPL |
63
+ |-------|------------|-----------|-----------|----------|
64
+ | 0.07 | 4.0797 | 3.0011 | 59.05 | 20.11 |
65
+ | 0.22 | 2.6331 | 2.5703 | 13.92 | 13.07 |
66
+ | 0.37 | 2.4906 | 2.4586 | 12.07 | 11.68 |
67
+ | 0.52 | 2.4213 | 2.3989 | 11.26 | 11.01 |
68
+ | 0.66 | 2.3700 | 2.3552 | 10.70 | 10.54 |
69
+ | 0.81 | 2.3375 | 2.3242 | 10.35 | 10.22 |
70
+ | 0.96 | 2.3094 | 2.2949 | 10.07 | 9.92 |
71
+ | 1.11 | 2.2720 | 2.2746 | 9.70 | 9.72 |
72
+ | 1.26 | 2.2527 | 2.2533 | 9.51 | 9.52 |
73
+ | 1.40 | 2.2345 | 2.2367 | 9.34 | 9.36 |
74
+ | 1.55 | 2.2239 | 2.2212 | 9.24 | 9.22 |
75
+ | 1.70 | 2.2043 | 2.2044 | 9.06 | 9.06 |
76
+ | 1.85 | 2.1885 | 2.1930 | 8.92 | 8.96 |
77
+ | 1.99 | 2.1843 | 2.1854 | 8.88 | 8.90 |
78
+
79
  ## Benchmarks
80
 
81
  We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
 
90
  Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
91
  Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
92
 
93
+ ## Generations
94
+
95
+ Prompt: **`Question:`**
96
+
97
+ Generation1:
98
+ ```text
99
+ what legal reforms faced rafer leadership during ww1?
100
+ ```
101
+
102
+ Generation2:
103
+ ```text
104
+ How many emissions should a frather?
105
+ ```
106
+
107
+ Generation3:
108
+ ```text
109
+ What do foreigners do?
110
+ ```
111
+
112
+ Generation4:
113
+ ```text
114
+ What is the best appropriate way to learn Japanese?
115
+ ```
116
+
117
+ Generation5:
118
+ ```text
119
+ How much is the MDU and JavaScript to the new UK?
120
+ ```
121
+
122
  ## Use Cases
123
 
124
  Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas: