Update README.md
Browse files
README.md
CHANGED
|
@@ -9,8 +9,30 @@ tags:
|
|
| 9 |
license: mit
|
| 10 |
---
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
# ViT-Beatrix Dual-Stream with Geometric Diversity
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
## Current Experiment: beatrix-dualstream-base
|
| 15 |
|
| 16 |
**Model Path**: `weights/beatrix-dualstream-base/20251009_030219/`
|
|
@@ -33,10 +55,8 @@ This model uses a class-aware geometric diversity loss that encourages:
|
|
| 33 |
|
| 34 |
## Performance
|
| 35 |
|
| 36 |
-
- **Best Accuracy**:
|
| 37 |
-
- **Current Epoch**:
|
| 38 |
- **Dataset**: CIFAR-100
|
| 39 |
|
| 40 |
---
|
| 41 |
-
|
| 42 |
-
*Last updated: Epoch 0 | Best Accuracy: 0.0395*
|
|
|
|
| 9 |
license: mit
|
| 10 |
---
|
| 11 |
|
| 12 |
+
Last remembered highest accuracy; 66% accuracy, and it had a bunch of other stuff too that apparently didn't get pushed from the logger.
|
| 13 |
+
|
| 14 |
+
Readme is busted, it uploaded a bad readme. I'll run test sets on all the models and accumulate a proper model list with accuracies asap.
|
| 15 |
+
|
| 16 |
+
These currently defeat the standard vit-beatrix in terms of pure classification accuracy, while leaving both blocks nearly independent.
|
| 17 |
+
|
| 18 |
+
This enables efficient transfer learning without high-decay processes, but the system is a bit jank.
|
| 19 |
+
|
| 20 |
+
Today I plan to shore up the actual repo's capacity to ensure this sort of fault doesn't happen again, where I run something and lose tracking information.
|
| 21 |
+
|
| 22 |
+
Additionally the train manifest from all models will likely be stored in an independent repo elsewhere for automated connection and linkage with the huggingface systems.
|
| 23 |
+
|
| 24 |
+
|
| 25 |
# ViT-Beatrix Dual-Stream with Geometric Diversity
|
| 26 |
|
| 27 |
+
This system is a dual-block transformer model inspired by Flux's dual-block structure.
|
| 28 |
+
|
| 29 |
+
## Experimental Tests
|
| 30 |
+
One set of blocks is devoted to the geometry while the other set is devoted to the images ingested.
|
| 31 |
+
|
| 32 |
+
The accuracy of the geometry can be completely decoupled and the image portion zeroed to retrain if systems start to decay.
|
| 33 |
+
|
| 34 |
+
This has shown robust capability with multiple lineage trains; the geometry being left in a "frozen" state yeilds by far the worst outcomes - yet I froze everything including the geometric cross-attention and the subsystems while leaving the image-end of the cross-attention scrambled and learning, so more than likely it relearned incorrect math and got stuck at around 20%.
|
| 35 |
+
|
| 36 |
## Current Experiment: beatrix-dualstream-base
|
| 37 |
|
| 38 |
**Model Path**: `weights/beatrix-dualstream-base/20251009_030219/`
|
|
|
|
| 55 |
|
| 56 |
## Performance
|
| 57 |
|
| 58 |
+
- **Best Accuracy**: 66.000%~ from memory
|
| 59 |
+
- **Current Epoch**: 100 give or take required, sorry about this I'll get real data here asap.
|
| 60 |
- **Dataset**: CIFAR-100
|
| 61 |
|
| 62 |
---
|
|
|
|
|
|