| | --- |
| | license: mit |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | --- |
| | ### Model versions |
| | | **Model** |**Parameters**|**RAM used (inference)**| |
| | | :------------------------ | :----------: | :--------------------: | |
| | | stok-0.1 | 3,798 | 6MB | |
| | | stok-0.2 | 4m | 542MB | |
| | | stok-0.3 | 962k | 136MB | |
| | | stok-0.3-large | 28.86m | 4GB | |
| | | stok-0.3-125m | 125.06m | 17.5GB | |
| | | stok-0.3.1 | 982k | 138MB | |
| | | stok-0.4-mini | 485k | 135MB | |
| | | stok-0.4 | 3.2m | 887MB | |
| | | stok-0.4-large | 17.33m | 4.7GB | |
| | | stok-0.4.1 | 3.31m | 919MB | |
| | ## Description |
| | stok is a family of models designed to run better at smaller parameter counts and maintain speed despite model size. |
| | stok-sub-1 will contain all versions of the stok model, prior to releasing stok-1. |
| | The goal of creating the stok models is to have models that regardless of size, can be ran incredibly fast on CPUs (including incredibly old ones). |
| | Currently, stok can only contextualize single prompts and will not understand them beyond a single word. So far, each new version (as in 0.1, 0.2, 0.3, and 0.4) |
| | has brought a new capability to the model. 0.2 gave the model the ability to end it's thought, 0.3 allowed the model to (usually) keep the token prediction within |
| | the context of the prompt, and 0.4 gives the model the ability to remove data it might not need and retry with an altered prompt. While the model definitely needs a little more help, it's only in version 0.4, there's a lot of work to go. |
| | (Like the ability to better contextualize prompts) |
| |
|
| | ## How to run |
| | First, when using python (more inference engines coming soon) you will need to install the ```run_stok.py``` file. The code for using this will look something like this: |
| | ```python |
| | from run_stok import load_model, run_model |
| | |
| | # you can replace stok-0.3.json with whichever stok model you want |
| | load_model("stok-0.3.json") |
| | response = run_model("Hello!", max_tokens=100, repetition_penalty=2) |
| | |
| | for chunk in response: |
| | print(chunk, end="") |
| | ``` |
| | this showcases how to use all currently functioning parameters, although max_tokens and repetition penalty are both technically optional.<br><br> |
| | If you'd rather use stokfile (a tool for just testing out the model) here's how you can. |
| | ``` |
| | python3 stokfile.py -m stok-0.3.json |
| | ``` |
| | If you want to see the speed of the output, just add -speed to the end, like so... |
| | ``` |
| | python3 stokfile.py -m stok-0.3.json -speed |
| | ``` |
| | |
| | ## Benchmark (SLMB) |
| | | **Model** | **Score** | **Med. Speed** | |
| | | :-----------------------: | :----------: | :------------: | |
| | | stok-0.1 | 1/15 | 361,703 t/s | |
| | | stok-0.2 | 4/15 | 3,887 t/s | |
| | | stok-0.3 | 5/15 | 254,902 t/s | |
| | | stok-0.3-large | 8/15 | 149,526 t/s | |
| | | stok-0.3-125m | 8/15 | 122,625 t/s | |
| | | stok-0.3.1 | 8/15 | 34,521 t/s | |
| | | stok-0.4-mini | 10/15 | 32,515 t/s | |
| | | stok-0.4 | 11/15 | 34,308 t/s | |
| | | stok-0.4-large | 11/15 | 31,775 t/s | |
| | | stok-0.4.1 | 11/15 | 32,263 t/s | |
| | | TinyLLama-v0 (F32) | 0/15 | 1,695 t/s | |
| | | Gemma-3-270m-it (F16) | 12/15 | 46 t/s | |
| | | H2o danube3 500m chat(F32)| 8/15 | 21 t/s | |
| | | Qwen3 0.6B (Q8_0) | 15/15 | 38 t/s | |
| | | Llama 3.2 1B instruct(F16)| 14/15 | 14 t/s | |
| |
|
| | The CPU used for each test was the AMD Ryzen 7 2700X<br> |
| | RAM: 64GB DDR4<br> |
| | ### The SLMB (Small Language Model Benchmark) v1 |
| | #### Quick description |
| | This is a very very simple model test, created to test the capabilies of much smaller LLMs. (The answers are included, though they aren't actually needed) |
| | #### The Benchmark |
| | Category 1: elementary math - x/4<br> |
| | what is 2+2 (4)<br> |
| | what is 12+5 (17)<br> |
| | what is 4/2 (2)<br> |
| | what is 3*3 (9)<br> |
| | <br> |
| | <br> |
| | Category 2: math with large numbers - x/4<br> |
| | what is 500+200 (700)<br> |
| | what is 10000+1000 (11000)<br> |
| | what is 100\*100 (10000)<br> |
| | what is 12\*5000 (60000)<br> |
| | <br><br> |
| | |
| | Category 3: input variation - x/5<br> |
| | what is 1+1 (2)<br> |
| | what is 1 + 1 (2)<br> |
| | what is 1+ 1 (2)<br> |
| | what is a dog (any answer that matches at least a very basic description of a dog)<br> |
| | What is a dog? (any answer that matches at least a very basic description of a dog)<br> |
| | <br> |
| | Category 4: basic logic - x/2 (2 points for correct, 0 for wrong)<br> |
| | I have three friends (Jeremy, Tyler, and Gabe) Friend #1 is Jeremy, Friend #3 is Tyler, who is friend #2? <br> |
| | (Gabe)<br> |
| | ## Conclusion |
| | While stok is definitely (in my opinion) pretty impressive -- especially given it's performance at such small sizes -- it has lots of room to go (also the |
| | benchmark may include more tests in the future) |