CogNet-1B / logs /more_data.log
thefinalboss's picture
Upload logs/more_data.log with huggingface_hub
a7dc421 verified
Raw
History Blame Contribute Delete
4.1 kB
Existing tokens: 199,971,619
1/3 - WikiText-103 (fixed API)...
Generating test split: 0%| | 0/4358 [00:00<?, ? examples/s] Generating test split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4358/4358 [00:00<00:00, 346110.29 examples/s]
Generating train split: 0%| | 0/1801350 [00:00<?, ? examples/s] Generating train split: 3%|β–Ž | 50000/1801350 [00:00<00:03, 490798.37 examples/s] Generating train split: 7%|β–‹ | 133000/1801350 [00:00<00:02, 684505.13 examples/s] Generating train split: 13%|β–ˆβ–Ž | 239000/1801350 [00:00<00:02, 692225.46 examples/s] Generating train split: 18%|β–ˆβ–Š | 326000/1801350 [00:00<00:01, 749504.02 examples/s] Generating train split: 24%|β–ˆβ–ˆβ– | 433000/1801350 [00:00<00:01, 730015.82 examples/s] Generating train split: 29%|β–ˆβ–ˆβ–‰ | 519000/1801350 [00:00<00:01, 762857.48 examples/s] Generating train split: 34%|β–ˆβ–ˆβ–ˆβ– | 615000/1801350 [00:00<00:01, 711042.16 examples/s] Generating train split: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 723000/1801350 [00:01<00:01, 710382.22 examples/s] Generating train split: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 834000/1801350 [00:01<00:01, 717654.58 examples/s] Generating train split: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 942675/1801350 [00:01<00:01, 696870.70 examples/s] Generating train split: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1028675/1801350 [00:01<00:01, 732840.22 examples/s] Generating train split: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1136675/1801350 [00:01<00:00, 725021.27 examples/s] Generating train split: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1219675/1801350 [00:01<00:00, 747761.05 examples/s] Generating train split: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1322675/1801350 [00:01<00:00, 719268.86 examples/s] Generating train split: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1435675/1801350 [00:01<00:00, 727812.57 examples/s] Generating train split: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1538675/1801350 [00:02<00:00, 710518.96 examples/s] Generating train split: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1622675/1801350 [00:02<00:00, 737050.01 examples/s] Generating train split: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1729675/1801350 [00:02<00:00, 724068.37 examples/s] Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1801350/1801350 [00:02<00:00, 721288.20 examples/s]
Generating validation split: 0%| | 0/3760 [00:00<?, ? examples/s] Generating validation split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3760/3760 [00:00<00:00, 480444.27 examples/s]
WikiText-103: 50,000 texts -> 22,635,189 tokens
WikiText-103: 100,000 texts -> 45,823,471 tokens
WikiText-103: 150,000 texts -> 69,308,297 tokens
WikiText-103: 200,000 texts -> 92,105,968 tokens
WikiText-103: 250,000 texts -> 115,047,336 tokens
WikiText-103: 300,000 texts -> 138,601,321 tokens
WikiText-103: 350,000 texts -> 161,788,432 tokens
WikiText-103: 400,000 texts -> 184,963,474 tokens
WikiText-103: 450,000 texts -> 207,852,194 tokens
WikiText-103: 500,000 texts -> 231,214,640 tokens
WikiText-103: 550,000 texts -> 254,905,316 tokens
WikiText-103: 600,000 texts -> 278,402,058 tokens
WikiText-103: 650,000 texts -> 301,290,874 tokens
WikiText-103: 700,000 texts -> 324,414,818 tokens
WikiText-103: 750,000 texts -> 347,964,383 tokens
WikiText-103: 800,000 texts -> 370,967,929 tokens
WikiText-103: 850,000 texts -> 394,148,337 tokens
WikiText-103: 900,000 texts -> 417,556,771 tokens
WikiText-103: 950,000 texts -> 441,075,122 tokens
WikiText-103: 1,000,000 texts -> 464,091,223 tokens
WikiText-103: 1,050,000 texts -> 487,133,309 tokens
WikiText-103: 1,100,000 texts -> 510,482,618 tokens
WikiText-103: 1,150,000 texts -> 533,526,405 tokens
OK WikiText-103: 540,519,546 tokens
2/3 - C4 English...
C4-EN: 50,000 texts -> 106,885,554 tokens
OK C4-EN: 215,577,179 tokens
3/3 - FineWeb-Edu...
FineWeb-Edu: 50,000 texts -> 242,392,039 tokens
⚠ Received signal 15, will save checkpoint after current step...
OK FineWeb-Edu: 480,132,888 tokens
TOTAL TOKENS: 1,436,201,232
Train: 1,364,391,170 tokens (1364.4M)
Val: 71,810,062 tokens (71.8M)
MORE DATA COMPLETE!