ASomeoneWhoInterestedWithAI commited on
Commit
b3aaca2
·
verified ·
1 Parent(s): 4c5f6f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -13,7 +13,7 @@ I was courious, what if a token look at other tokens, but without QKV? Instead,
13
 
14
  Then I try it for MNIST (with LookThem arch with a layer), in just few epoch, I get astonishing results. Because of that, I get deeper and try for CIFAR-10, with similar architecture. And the results is good too. Because of that, I get deeper to Tiny-ImageNet. The result is.. I don't know, the notebook's results is not for evaluation result (the AI changed the code). Maybe not 100% accuracy, but at least is can learn, with even less memory in disk (just ~5MB). That's the results for you all.
15
 
16
- There's many space to experimenting like deeper architecture, another activation function, etc. But without big train parameter tunes, it's reach SOTA (from scratch category).. correct me if I wrong about SOTA. So, for everyone who have bigger resources, you all can experimenting with this architecture. I train it on Google Colab's T4, and code generated by Gemini 3 Flash (except for original code).
17
 
18
  # Code
19
 
 
13
 
14
  Then I try it for MNIST (with LookThem arch with a layer), in just few epoch, I get astonishing results. Because of that, I get deeper and try for CIFAR-10, with similar architecture. And the results is good too. Because of that, I get deeper to Tiny-ImageNet. The result is.. I don't know, the notebook's results is not for evaluation result (the AI changed the code). Maybe not 100% accuracy, but at least is can learn, with even less memory in disk (just ~5MB). That's the results for you all.
15
 
16
+ There's many space to experimenting like deeper architecture, another activation function, etc. But without big train parameter tunes, it's reach SOTA (for it's size).. correct me if I wrong about SOTA. So, for everyone who have bigger resources, you all can experimenting with this architecture. I train it on Google Colab's T4, and code generated by Gemini 3 Flash (except for original code).
17
 
18
  # Code
19