Update README.md
Browse files
README.md
CHANGED
|
@@ -20,18 +20,24 @@ base_model:
|
|
| 20 |
|
| 21 |
# Newest: NLI head preliminary
|
| 22 |
|
| 23 |
-
So the outcome is surprisingly good. Around 75% accuracy give or take for NLI with the prototype conv5d that I'm working with.
|
| 24 |
|
| 25 |
It's not a true conv5d, more an accumulator meant to encapsulate the necessary behavioral implications of every stretch that implicates the potential for a conv5d.
|
| 26 |
This structure when paired with structural geometry defeats the MLP in terms of overfitting.
|
| 27 |
|
| 28 |
MLP managed to overfit the model at around 71% or so, to nearly 95% training accuracy.
|
| 29 |
|
| 30 |
-
Conv5d managed to preserve the bank's geometry instead of completely collapsing it into noise, allowing around 75% with 80% training accuracy.
|
| 31 |
|
| 32 |
So the problem is still present. Without the bank the model reached around 64% or so, with the anchor bank the model reaches around 75% with geometric solidity but it's not
|
| 33 |
enough to say the NLI head works yet.
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
HOWEVER, it's enough to say that it can with more training. This is enough to continue for me.
|
| 36 |
|
| 37 |
If I were to say unlock the model's weights and train ALL FIVE EXPERTS, this would be an arbitrary task. The system would learn it instantly.
|
|
|
|
| 20 |
|
| 21 |
# Newest: NLI head preliminary
|
| 22 |
|
| 23 |
+
So the outcome is surprisingly good. Around 75-76% accuracy give or take for NLI with the prototype conv5d that I'm working with.
|
| 24 |
|
| 25 |
It's not a true conv5d, more an accumulator meant to encapsulate the necessary behavioral implications of every stretch that implicates the potential for a conv5d.
|
| 26 |
This structure when paired with structural geometry defeats the MLP in terms of overfitting.
|
| 27 |
|
| 28 |
MLP managed to overfit the model at around 71% or so, to nearly 95% training accuracy.
|
| 29 |
|
| 30 |
+
Conv5d managed to preserve the bank's geometry instead of completely collapsing it into noise, allowing around 75-76% with 80% training accuracy.
|
| 31 |
|
| 32 |
So the problem is still present. Without the bank the model reached around 64% or so, with the anchor bank the model reaches around 75% with geometric solidity but it's not
|
| 33 |
enough to say the NLI head works yet.
|
| 34 |
|
| 35 |
+
As you can see either way, the model does eventually begin to overfit in it's current state. It's simply too small and predominantly distilled,
|
| 36 |
+
which means it will have problems no matter which task I attempt to teach this model.
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+

|
| 40 |
+
|
| 41 |
HOWEVER, it's enough to say that it can with more training. This is enough to continue for me.
|
| 42 |
|
| 43 |
If I were to say unlock the model's weights and train ALL FIVE EXPERTS, this would be an arbitrary task. The system would learn it instantly.
|