Update README.md
Browse files
README.md
CHANGED
|
@@ -4,6 +4,20 @@ datasets:
|
|
| 4 |
- AbstractPhil/geometric-vocab
|
| 5 |
pipeline_tag: zero-shot-classification
|
| 6 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
# Pentachoron Geometric Feature Extraction
|
| 9 |
|
|
|
|
| 4 |
- AbstractPhil/geometric-vocab
|
| 5 |
pipeline_tag: zero-shot-classification
|
| 6 |
---
|
| 7 |
+
# Enabling the Mix-N-Cut
|
| 8 |
+
|
| 9 |
+
I've built a mix-n-cut that I've been avoiding enabling. This one is particularly formatted for pentachoron, so we'll see how it fares. I'm trying to build one as SMALL AS POSSIBLE< so if this mix-n-cut can pull the task out of the bag I may as well run it.
|
| 10 |
+
|
| 11 |
+
As it stands the tiny vits cap at 41% cifar100 with no augmentations. I've been running all the trains without a single special effect and only minimal normalization.
|
| 12 |
+
|
| 13 |
+
Lets see how the upcoming trains fare.
|
| 14 |
+
|
| 15 |
+
pixie_base_128d_patch4_128h
|
| 16 |
+
|
| 17 |
+
Pixie base has 10 layers with 5 goemetic and 5 multihead traditional attention. Lets see how the mix-n-cut fares with the earlier ones first, then we'll run the base.
|
| 18 |
+
|
| 19 |
+
The smaller ones seem to behave better using the geometric attention at 256 expert heads, which is odd to me but whatever works. They don't get much bigger with more experts, so I'll just try a tiny one with a ton of heads first.
|
| 20 |
+
|
| 21 |
|
| 22 |
# Pentachoron Geometric Feature Extraction
|
| 23 |
|