ASomeoneWhoInterestedWithAI commited on
Commit
2604e09
·
verified ·
1 Parent(s): 769e80b

Realize that the eval doesn't eval dataset

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  # Explanation
12
  I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.
13
 
14
- Then I try it for MNIST (with LookThem arch with a layer), in just few epoch, I get astonishing results. Because of that, I get deeper and try for CIFAR-10, with similar architecture. And the results is good too. Because of that, I get deeper to Tiny-ImageNet. The result is.. around 50%. That's not 100% accuracy, but at least is can compete with old Tiny-ImageNet architecture, with even less memory in disk (just ~5MB). That's the results for you all.
15
 
16
  There's many space to experimenting like deeper architecture, another activation function, etc. But without big train parameter tunes, it's reach SOTA (from scratch category).. correct me if I wrong about SOTA. So, for everyone who have bigger resources, you all can experimenting with this architecture. I train it on Google Colab's T4, and code generated by Gemini 3 Flash (except for original code).
17
 
@@ -146,12 +146,12 @@ class LookThemLayer(nn.Module):
146
  - Test accuracy: 79.79%
147
  ## Tiny-ImageNet
148
  - 15 Epoch training, train accuracy: 32.07%
149
- - Test accuracy (15 epoch): 42.01%
150
  - 30 Epoch training, train accuracy: 37.20%
151
- - Test accuracy: 50.53%
152
  - File size: 5.72MB
153
 
154
  More detail in notebook
155
 
156
  # Reaction
157
- I don't believe this simple architecture can achieve ResNet-34 performance, and the realization that this LookThem architecture born from.. "just try that".. So for you all, thanks for reading this Spontaneous Paper (A raw paper for those who are too lazy to polish it).
 
11
  # Explanation
12
  I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.
13
 
14
+ Then I try it for MNIST (with LookThem arch with a layer), in just few epoch, I get astonishing results. Because of that, I get deeper and try for CIFAR-10, with similar architecture. And the results is good too. Because of that, I get deeper to Tiny-ImageNet. The result is.. I don't know, the notebook's results is not for evaluation result (the AI changed the code). Maybe not 100% accuracy, but at least is can learn, with even less memory in disk (just ~5MB). That's the results for you all.
15
 
16
  There's many space to experimenting like deeper architecture, another activation function, etc. But without big train parameter tunes, it's reach SOTA (from scratch category).. correct me if I wrong about SOTA. So, for everyone who have bigger resources, you all can experimenting with this architecture. I train it on Google Colab's T4, and code generated by Gemini 3 Flash (except for original code).
17
 
 
146
  - Test accuracy: 79.79%
147
  ## Tiny-ImageNet
148
  - 15 Epoch training, train accuracy: 32.07%
149
+ - Test accuracy (15 epoch): ? (the result are lying)
150
  - 30 Epoch training, train accuracy: 37.20%
151
+ - Test accuracy: ? (the results are lying)
152
  - File size: 5.72MB
153
 
154
  More detail in notebook
155
 
156
  # Reaction
157
+ I don't believe this simple architecture can achieve this performance, and the realization that this LookThem architecture born from.. "just try that".. So for you all, thanks for reading this Spontaneous Paper (A raw paper for those who are too lazy to polish it).