Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ I was courious, what if a token look at other tokens, but without QKV? Instead,
|
|
| 13 |
|
| 14 |
Then I try it for MNIST (with LookThem arch with a layer), in just few epoch, I get astonishing results. Because of that, I get deeper and try for CIFAR-10, with similar architecture. And the results is good too. Because of that, I get deeper to Tiny-ImageNet. The result is.. I don't know, the notebook's results is not for evaluation result (the AI changed the code). Maybe not 100% accuracy, but at least is can learn, with even less memory in disk (just ~5MB). That's the results for you all.
|
| 15 |
|
| 16 |
-
There's many space to experimenting like deeper architecture, another activation function, etc. But without big train parameter tunes, it's reach SOTA (
|
| 17 |
|
| 18 |
# Code
|
| 19 |
|
|
|
|
| 13 |
|
| 14 |
Then I try it for MNIST (with LookThem arch with a layer), in just few epoch, I get astonishing results. Because of that, I get deeper and try for CIFAR-10, with similar architecture. And the results is good too. Because of that, I get deeper to Tiny-ImageNet. The result is.. I don't know, the notebook's results is not for evaluation result (the AI changed the code). Maybe not 100% accuracy, but at least is can learn, with even less memory in disk (just ~5MB). That's the results for you all.
|
| 15 |
|
| 16 |
+
There's many space to experimenting like deeper architecture, another activation function, etc. But without big train parameter tunes, it's reach SOTA (for it's size).. correct me if I wrong about SOTA. So, for everyone who have bigger resources, you all can experimenting with this architecture. I train it on Google Colab's T4, and code generated by Gemini 3 Flash (except for original code).
|
| 17 |
|
| 18 |
# Code
|
| 19 |
|