Update README.md
Browse files
README.md
CHANGED
|
@@ -8,6 +8,28 @@ tags:
|
|
| 8 |
|
| 9 |
# LookThem: Ratio-based Attention
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
# Explanation
|
| 12 |
I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.
|
| 13 |
|
|
@@ -135,25 +157,7 @@ class LookThemLayer(nn.Module):
|
|
| 135 |
|
| 136 |
## Colab notebook in this repo
|
| 137 |
|
| 138 |
-
|
| 139 |
-
## MNIST
|
| 140 |
-
- 11 Epoch training, train accuracy: 99.02%
|
| 141 |
-
- Test accuracy: 98.66%
|
| 142 |
-
## CIFAR-10
|
| 143 |
-
- 10 Epoch training, train accuracy: 67.89%
|
| 144 |
-
- Test accuracy (10 epoch): 73.42%
|
| 145 |
-
- 40 Epoch training, train accuracy: 76.63%
|
| 146 |
-
- Test accuracy: 79.79%
|
| 147 |
-
## Tiny-ImageNet
|
| 148 |
-
- 15 Epoch training, train accuracy: 32.07%
|
| 149 |
-
- Test accuracy (15 epoch): ? (the result are lying)
|
| 150 |
-
- 30 Epoch training, train accuracy: 37.20%
|
| 151 |
-
- Test accuracy: ? (the results are lying)
|
| 152 |
-
- File size: 5.72MB
|
| 153 |
-
## Tiny-ImageNet 2 (V5)
|
| 154 |
-
- 20 Epoch training, train accuracy: 36.98%
|
| 155 |
-
- Test accuracy: 34.2%
|
| 156 |
-
- Code in second notebook
|
| 157 |
|
| 158 |
More detail in notebook
|
| 159 |
|
|
|
|
| 8 |
|
| 9 |
# LookThem: Ratio-based Attention
|
| 10 |
|
| 11 |
+
# Results
|
| 12 |
+
## MNIST
|
| 13 |
+
- 11 Epoch training, train accuracy: 99.02%
|
| 14 |
+
- Test accuracy: 98.66%
|
| 15 |
+
## CIFAR-10
|
| 16 |
+
- 10 Epoch training, train accuracy: 67.89%
|
| 17 |
+
- Test accuracy (10 epoch): 73.42%
|
| 18 |
+
- 40 Epoch training, train accuracy: 76.63%
|
| 19 |
+
- Test accuracy: 79.79%
|
| 20 |
+
## Tiny-ImageNet
|
| 21 |
+
- 15 Epoch training, train accuracy: 32.07%
|
| 22 |
+
- Test accuracy (15 epoch): ? (the result are lying)
|
| 23 |
+
- 30 Epoch training, train accuracy: 37.20%
|
| 24 |
+
- Test accuracy: ? (the results are lying)
|
| 25 |
+
- File size: 5.72MB
|
| 26 |
+
## Tiny-ImageNet 2 (V5)
|
| 27 |
+
- 20 Epoch training, train accuracy: 36.98%
|
| 28 |
+
- Test accuracy: 34.2%
|
| 29 |
+
- 40 Epoch training, train accuracy: 46.58%
|
| 30 |
+
- Test accuracy: 35.46%
|
| 31 |
+
- Code in second and third notebook
|
| 32 |
+
|
| 33 |
# Explanation
|
| 34 |
I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.
|
| 35 |
|
|
|
|
| 157 |
|
| 158 |
## Colab notebook in this repo
|
| 159 |
|
| 160 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
More detail in notebook
|
| 163 |
|