Update README.md
Browse files
README.md
CHANGED
|
@@ -4,26 +4,52 @@ base_model:
|
|
| 4 |
- nomic-ai/nomic-bert-2048
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
130,000,000 - 4-30 masked token samples with 80% mask rate
|
| 10 |
|
| 11 |
-
253,952,000 - 77 token samples with 20% mask rate
|
| 12 |
|
| 13 |
-
775,000,000 - 144-256 token samples with 30% mask rate
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
Total samples; 1,
|
| 20 |
|
| 21 |
The model has learned to categorize certain masked patterns with their categories and special tokens.
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
She's in her final stage now.
|
| 27 |
|
| 28 |
```
|
| 29 |
<subject>
|
|
|
|
| 4 |
- nomic-ai/nomic-bert-2048
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# Release - bert-beatrix-2048 v1
|
| 8 |
+
|
| 9 |
+
Entirely saturated pretrained masking window, fixated on expanding the masking potential using subject and shunt allocation tokenization systems.
|
| 10 |
+
|
| 11 |
+
What we have here, is our first subject burned saturated rope prototype - though I must admit it took longer than expected to train, will provide a perfect excitation catalyst for the next step.
|
| 12 |
+
|
| 13 |
+
After more research, I learned I could probably saturate this context window in a fraction of the steps - however, this is a full pretrain burn, so I let it go the distance.
|
| 14 |
+
|
| 15 |
+
There is much to be learned here - especially with diffusion embedding shaping.
|
| 16 |
+
|
| 17 |
+
There may be faults in it's core, I've had some issues with the vocab and tokenization structure having a valuation misalignment, however I decided nonetheless to let it complete with those faults intact. There's no telling what that may teach alongside. The actual data itself should reflect the correct tokenization, I just need to make sure it loads correctly. If not, I'll retrain her.
|
| 18 |
+
|
| 19 |
+
|
| 20 |
|
|
|
|
| 21 |
|
|
|
|
| 22 |
|
|
|
|
| 23 |
|
| 24 |
+
## 2008000 total steps at batch size 1024.
|
| 25 |
|
| 26 |
+
This is the 26 categorical finetune of nomic-bert-2048's encoder.
|
| 27 |
+
|
| 28 |
+
* 130,000,000 - 4-30 masked token samples with 80% mask rate
|
| 29 |
+
* * Learned timestep associative noise and why it matters.
|
| 30 |
+
* 253,952,000 - 77 token samples with 20% mask rate
|
| 31 |
+
* * Learned context over the noise.
|
| 32 |
+
* 775,000,000 - 144-256 token samples with 30% mask rate
|
| 33 |
+
* 453,800,000 - 385-512 token samples with 30% mask rate
|
| 34 |
+
* 227,328,000 - 1024 token samples with 30% mask rate
|
| 35 |
+
* 234,112,000 - 2048 token samples with 30% mask rate
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
The final accuracy is about 95% give or take, but it's also comparing against the actual pretrained data. The point being it needs to associate information with information more cleanly, rather than fully enforcing the various logistical and subjective elements instantiated from external elements.
|
| 39 |
|
| 40 |
+
Total samples; 1,822,080,000
|
| 41 |
|
| 42 |
The model has learned to categorize certain masked patterns with their categories and special tokens.
|
| 43 |
|
| 44 |
+
It turns WHAT YOU WANT into something A BIT more reliable - in a vector-rich and cohesive for SOMETHING ELSE to process.
|
| 45 |
+
|
| 46 |
+
This will likely result in many variant and incorrect pathways if you try to use it as an LLM, or if you try to use it to TRAIN an LLM. However, I'm going to do it anyway, because I can.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
HOWEVER, the deterministic nature of subjectivity... will have a very crucial role when being shaped by... much more intelligent harmonic influence.
|
| 50 |
+
|
| 51 |
+
The REAL experiment can now start.
|
| 52 |
|
|
|
|
| 53 |
|
| 54 |
```
|
| 55 |
<subject>
|