Update README.md
Browse files
README.md
CHANGED
|
@@ -18,19 +18,23 @@ base_model:
|
|
| 18 |
- AbstractPhil/geolip-bertenstein
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# Newest: Prepping 12m conceptual-captions bert extractions
|
| 22 |
|
| 23 |
The dataset is going to be in pt chunks because they load directly to vram nearly instantly in colab, and the system operates on them quicker than dataloaders.
|
| 24 |
|
| 25 |
-
I'll be running the full 12m set, no exceptions - short llava, long llava, and original captions.
|
| 26 |
|
| 27 |
-
After the
|
| 28 |
|
| 29 |
It's legitimately wild watching the system sit there at 100% accuracy validation, but it requires additional complexity so that isn't the measure to analyze.
|
| 30 |
The problem is solved for recall, but the internal structure's geometric system needs to align to the larger spectrum of rigidity that the smooth manifold
|
| 31 |
deviations require to create a full cohesion, meaning more data.
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
|
| 36 |
# 2 additional epochs, 1m samples ran
|
|
|
|
| 18 |
- AbstractPhil/geolip-bertenstein
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# Newest: Prepping 12m conceptual-captions bert extractions aka 36m full extractions
|
| 22 |
|
| 23 |
The dataset is going to be in pt chunks because they load directly to vram nearly instantly in colab, and the system operates on them quicker than dataloaders.
|
| 24 |
|
| 25 |
+
I'll be running the full 12m set on all three captions, no exceptions - short llava, long llava, and original captions.
|
| 26 |
|
| 27 |
+
After the 36m 5 expert dataset training completes, the core model will be ready.
|
| 28 |
|
| 29 |
It's legitimately wild watching the system sit there at 100% accuracy validation, but it requires additional complexity so that isn't the measure to analyze.
|
| 30 |
The problem is solved for recall, but the internal structure's geometric system needs to align to the larger spectrum of rigidity that the smooth manifold
|
| 31 |
deviations require to create a full cohesion, meaning more data.
|
| 32 |
|
| 33 |
+
36 million samples roughly 10 epochs should be a fair assessment. Hopefully the data isn't too much.
|
| 34 |
+
|
| 35 |
+
Saturating the internals of the anchor and the subsystem will allow for more complex processes and easy alignment with pieces of the data. After that
|
| 36 |
+
it will be quite fast to sample the most accurate captions and begin forming vit association, which will allow for a full next token prediction capacity
|
| 37 |
+
thanks to the internal similarity mechanisms and the formed in steel anchor bank's solidity.
|
| 38 |
|
| 39 |
|
| 40 |
# 2 additional epochs, 1m samples ran
|