Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,13 @@ base_model:
|
|
| 27 |
|
| 28 |
|
| 29 |
# Newest: Prepping 12m conceptual-captions bert extractions aka 36m extractions * 5 models
|
| 30 |
-
So around, 180,000,000 different toal samples, which is fundamentally different than a single task of repeated
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
The dataset is going to be in pt chunks because they load directly to vram nearly instantly in colab, and the system operates on them quicker than dataloaders.
|
| 33 |
|
|
|
|
| 27 |
|
| 28 |
|
| 29 |
# Newest: Prepping 12m conceptual-captions bert extractions aka 36m extractions * 5 models
|
| 30 |
+
So around, 180,000,000 different toal samples, which is fundamentally different than a single task of repeated 200k or 500k like I've been doing.
|
| 31 |
+
|
| 32 |
+
https://huggingface.co/datasets/AbstractPhil/conceptual-captions-12m-webdataset-berts
|
| 33 |
+
|
| 34 |
+
You can track the process there.
|
| 35 |
+
|
| 36 |
+
|
| 37 |
|
| 38 |
The dataset is going to be in pt chunks because they load directly to vram nearly instantly in colab, and the system operates on them quicker than dataloaders.
|
| 39 |
|