Update README.md
Browse files
README.md
CHANGED
|
@@ -43,4 +43,9 @@ If all our tokens are sent to just a few popular experts, that will make trainin
|
|
| 43 |
|
| 44 |
|
| 45 |
## "Wait...but you called this a frankenMoE?"
|
| 46 |
-
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic...at least...until now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
|
| 45 |
## "Wait...but you called this a frankenMoE?"
|
| 46 |
+
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. For now, frankenMoE remains psychotic...at least...until now.
|
| 47 |
+
|
| 48 |
+
This model is probably the highest performing model on the site, but considering even I, the person who created it, only have 12 gigs of VRAM...only the truly insane will even be capable of controlling the Earth Render.
|
| 49 |
+
|
| 50 |
+

|
| 51 |
+
## this response took about 2 and a half hours lol...
|