Update README.md
Browse files
README.md
CHANGED
|
@@ -52,6 +52,30 @@ StopAskingQuestionsMini uses a scaled down version of the [Qwen3](https://arxiv.
|
|
| 52 |
| Tie Word Embeddings | True |
|
| 53 |
| Vocab Size | 1024 |
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
## Benchmarks
|
| 56 |
|
| 57 |
We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
|
|
@@ -66,6 +90,35 @@ We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a qu
|
|
| 66 |
Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
|
| 67 |
Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
## Use Cases
|
| 70 |
|
| 71 |
Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas:
|
|
|
|
| 52 |
| Tie Word Embeddings | True |
|
| 53 |
| Vocab Size | 1024 |
|
| 54 |
|
| 55 |
+
## Training
|
| 56 |
+
|
| 57 |
+
StopAsingQuestionsMini trained on 23 million tokens of questions for two epochs with a batch size of 16.
|
| 58 |
+
|
| 59 |
+
### Training Results
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
| Epoch | Train Loss | Eval Loss | Train PPL | Eval PPL |
|
| 63 |
+
|-------|------------|-----------|-----------|----------|
|
| 64 |
+
| 0.07 | 4.0797 | 3.0011 | 59.05 | 20.11 |
|
| 65 |
+
| 0.22 | 2.6331 | 2.5703 | 13.92 | 13.07 |
|
| 66 |
+
| 0.37 | 2.4906 | 2.4586 | 12.07 | 11.68 |
|
| 67 |
+
| 0.52 | 2.4213 | 2.3989 | 11.26 | 11.01 |
|
| 68 |
+
| 0.66 | 2.3700 | 2.3552 | 10.70 | 10.54 |
|
| 69 |
+
| 0.81 | 2.3375 | 2.3242 | 10.35 | 10.22 |
|
| 70 |
+
| 0.96 | 2.3094 | 2.2949 | 10.07 | 9.92 |
|
| 71 |
+
| 1.11 | 2.2720 | 2.2746 | 9.70 | 9.72 |
|
| 72 |
+
| 1.26 | 2.2527 | 2.2533 | 9.51 | 9.52 |
|
| 73 |
+
| 1.40 | 2.2345 | 2.2367 | 9.34 | 9.36 |
|
| 74 |
+
| 1.55 | 2.2239 | 2.2212 | 9.24 | 9.22 |
|
| 75 |
+
| 1.70 | 2.2043 | 2.2044 | 9.06 | 9.06 |
|
| 76 |
+
| 1.85 | 2.1885 | 2.1930 | 8.92 | 8.96 |
|
| 77 |
+
| 1.99 | 2.1843 | 2.1854 | 8.88 | 8.90 |
|
| 78 |
+
|
| 79 |
## Benchmarks
|
| 80 |
|
| 81 |
We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
|
|
|
|
| 90 |
Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
|
| 91 |
Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
|
| 92 |
|
| 93 |
+
## Generations
|
| 94 |
+
|
| 95 |
+
Prompt: **`Question:`**
|
| 96 |
+
|
| 97 |
+
Generation1:
|
| 98 |
+
```text
|
| 99 |
+
what legal reforms faced rafer leadership during ww1?
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
Generation2:
|
| 103 |
+
```text
|
| 104 |
+
How many emissions should a frather?
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
Generation3:
|
| 108 |
+
```text
|
| 109 |
+
What do foreigners do?
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
Generation4:
|
| 113 |
+
```text
|
| 114 |
+
What is the best appropriate way to learn Japanese?
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
Generation5:
|
| 118 |
+
```text
|
| 119 |
+
How much is the MDU and JavaScript to the new UK?
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
## Use Cases
|
| 123 |
|
| 124 |
Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas:
|