Update README.md

ce08ea1 verified 5 months ago

5.16 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	---
	### Model versions
	\| Model \|Parameters\|RAM used (inference)\|
	\| :------------------------ \| :----------: \| :--------------------: \|
	\| stok-0.1 \| 3,798 \| 6MB \|
	\| stok-0.2 \| 4m \| 542MB \|
	\| stok-0.3 \| 962k \| 136MB \|
	\| stok-0.3-large \| 28.86m \| 4GB \|
	\| stok-0.3-125m \| 125.06m \| 17.5GB \|
	\| stok-0.3.1 \| 982k \| 138MB \|
	\| stok-0.4-mini \| 485k \| 135MB \|
	\| stok-0.4 \| 3.2m \| 887MB \|
	\| stok-0.4-large \| 17.33m \| 4.7GB \|
	\| stok-0.4.1 \| 3.31m \| 919MB \|
	## Description
	stok is a family of models designed to run better at smaller parameter counts and maintain speed despite model size.
	stok-sub-1 will contain all versions of the stok model, prior to releasing stok-1.
	The goal of creating the stok models is to have models that regardless of size, can be ran incredibly fast on CPUs (including incredibly old ones).
	Currently, stok can only contextualize single prompts and will not understand them beyond a single word. So far, each new version (as in 0.1, 0.2, 0.3, and 0.4)
	has brought a new capability to the model. 0.2 gave the model the ability to end it's thought, 0.3 allowed the model to (usually) keep the token prediction within
	the context of the prompt, and 0.4 gives the model the ability to remove data it might not need and retry with an altered prompt. While the model definitely needs a little more help, it's only in version 0.4, there's a lot of work to go.
	(Like the ability to better contextualize prompts)

	## How to run
	First, when using python (more inference engines coming soon) you will need to install the ```run_stok.py``` file. The code for using this will look something like this:
	```python
	from run_stok import load_model, run_model

	# you can replace stok-0.3.json with whichever stok model you want
	load_model("stok-0.3.json")
	response = run_model("Hello!", max_tokens=100, repetition_penalty=2)

	for chunk in response:
	print(chunk, end="")
	```
	this showcases how to use all currently functioning parameters, although max_tokens and repetition penalty are both technically optional.<br><br>
	If you'd rather use stokfile (a tool for just testing out the model) here's how you can.
	```
	python3 stokfile.py -m stok-0.3.json
	```
	If you want to see the speed of the output, just add -speed to the end, like so...
	```
	python3 stokfile.py -m stok-0.3.json -speed
	```

	## Benchmark (SLMB)
	\| Model \| Score \| Med. Speed \|
	\| :-----------------------: \| :----------: \| :------------: \|
	\| stok-0.1 \| 1/15 \| 361,703 t/s \|
	\| stok-0.2 \| 4/15 \| 3,887 t/s \|
	\| stok-0.3 \| 5/15 \| 254,902 t/s \|
	\| stok-0.3-large \| 8/15 \| 149,526 t/s \|
	\| stok-0.3-125m \| 8/15 \| 122,625 t/s \|
	\| stok-0.3.1 \| 8/15 \| 34,521 t/s \|
	\| stok-0.4-mini \| 10/15 \| 32,515 t/s \|
	\| stok-0.4 \| 11/15 \| 34,308 t/s \|
	\| stok-0.4-large \| 11/15 \| 31,775 t/s \|
	\| stok-0.4.1 \| 11/15 \| 32,263 t/s \|
	\| TinyLLama-v0 (F32) \| 0/15 \| 1,695 t/s \|
	\| Gemma-3-270m-it (F16) \| 12/15 \| 46 t/s \|
	\| H2o danube3 500m chat(F32)\| 8/15 \| 21 t/s \|
	\| Qwen3 0.6B (Q8_0) \| 15/15 \| 38 t/s \|
	\| Llama 3.2 1B instruct(F16)\| 14/15 \| 14 t/s \|

	The CPU used for each test was the AMD Ryzen 7 2700X<br>
	RAM: 64GB DDR4<br>
	### The SLMB (Small Language Model Benchmark) v1
	#### Quick description
	This is a very very simple model test, created to test the capabilies of much smaller LLMs. (The answers are included, though they aren't actually needed)
	#### The Benchmark
	Category 1: elementary math - x/4<br>
	what is 2+2 (4)<br>
	what is 12+5 (17)<br>
	what is 4/2 (2)<br>
	what is 3*3 (9)<br>
	<br>
	<br>
	Category 2: math with large numbers - x/4<br>
	what is 500+200 (700)<br>
	what is 10000+1000 (11000)<br>
	what is 100\*100 (10000)<br>
	what is 12\*5000 (60000)<br>
	<br><br>

	Category 3: input variation - x/5<br>
	what is 1+1 (2)<br>
	what is 1 + 1 (2)<br>
	what is 1+ 1 (2)<br>
	what is a dog (any answer that matches at least a very basic description of a dog)<br>
	What is a dog? (any answer that matches at least a very basic description of a dog)<br>
	<br>
	Category 4: basic logic - x/2 (2 points for correct, 0 for wrong)<br>
	I have three friends (Jeremy, Tyler, and Gabe) Friend #1 is Jeremy, Friend #3 is Tyler, who is friend #2? <br>
	(Gabe)<br>
	## Conclusion
	While stok is definitely (in my opinion) pretty impressive -- especially given it's performance at such small sizes -- it has lots of room to go (also the
	benchmark may include more tests in the future)