AbstractPhil commited on
Commit
151709d
·
verified ·
1 Parent(s): ad97899

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -27,7 +27,13 @@ base_model:
27
 
28
 
29
  # Newest: Prepping 12m conceptual-captions bert extractions aka 36m extractions * 5 models
30
- So around, 180,000,000 different toal samples, which is fundamentally different than a single task of repeated of 200k or 500k like I've been doing.
 
 
 
 
 
 
31
 
32
  The dataset is going to be in pt chunks because they load directly to vram nearly instantly in colab, and the system operates on them quicker than dataloaders.
33
 
 
27
 
28
 
29
  # Newest: Prepping 12m conceptual-captions bert extractions aka 36m extractions * 5 models
30
+ So around, 180,000,000 different toal samples, which is fundamentally different than a single task of repeated 200k or 500k like I've been doing.
31
+
32
+ https://huggingface.co/datasets/AbstractPhil/conceptual-captions-12m-webdataset-berts
33
+
34
+ You can track the process there.
35
+
36
+
37
 
38
  The dataset is going to be in pt chunks because they load directly to vram nearly instantly in colab, and the system operates on them quicker than dataloaders.
39