ASomeoneWhoInterestedWithAI commited on
Commit
bd374cf
·
verified ·
1 Parent(s): 231a4d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -19
README.md CHANGED
@@ -8,6 +8,28 @@ tags:
8
 
9
  # LookThem: Ratio-based Attention
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  # Explanation
12
  I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.
13
 
@@ -135,25 +157,7 @@ class LookThemLayer(nn.Module):
135
 
136
  ## Colab notebook in this repo
137
 
138
- # Results
139
- ## MNIST
140
- - 11 Epoch training, train accuracy: 99.02%
141
- - Test accuracy: 98.66%
142
- ## CIFAR-10
143
- - 10 Epoch training, train accuracy: 67.89%
144
- - Test accuracy (10 epoch): 73.42%
145
- - 40 Epoch training, train accuracy: 76.63%
146
- - Test accuracy: 79.79%
147
- ## Tiny-ImageNet
148
- - 15 Epoch training, train accuracy: 32.07%
149
- - Test accuracy (15 epoch): ? (the result are lying)
150
- - 30 Epoch training, train accuracy: 37.20%
151
- - Test accuracy: ? (the results are lying)
152
- - File size: 5.72MB
153
- ## Tiny-ImageNet 2 (V5)
154
- - 20 Epoch training, train accuracy: 36.98%
155
- - Test accuracy: 34.2%
156
- - Code in second notebook
157
 
158
  More detail in notebook
159
 
 
8
 
9
  # LookThem: Ratio-based Attention
10
 
11
+ # Results
12
+ ## MNIST
13
+ - 11 Epoch training, train accuracy: 99.02%
14
+ - Test accuracy: 98.66%
15
+ ## CIFAR-10
16
+ - 10 Epoch training, train accuracy: 67.89%
17
+ - Test accuracy (10 epoch): 73.42%
18
+ - 40 Epoch training, train accuracy: 76.63%
19
+ - Test accuracy: 79.79%
20
+ ## Tiny-ImageNet
21
+ - 15 Epoch training, train accuracy: 32.07%
22
+ - Test accuracy (15 epoch): ? (the result are lying)
23
+ - 30 Epoch training, train accuracy: 37.20%
24
+ - Test accuracy: ? (the results are lying)
25
+ - File size: 5.72MB
26
+ ## Tiny-ImageNet 2 (V5)
27
+ - 20 Epoch training, train accuracy: 36.98%
28
+ - Test accuracy: 34.2%
29
+ - 40 Epoch training, train accuracy: 46.58%
30
+ - Test accuracy: 35.46%
31
+ - Code in second and third notebook
32
+
33
  # Explanation
34
  I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.
35
 
 
157
 
158
  ## Colab notebook in this repo
159
 
160
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
  More detail in notebook
163