akhooli commited on
Commit
6cd2615
·
verified ·
1 Parent(s): 94e67e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -5,4 +5,5 @@ Arabic ModernBERT model partially trained (13% of one epoch)
5
  on a filtered [subset](https://huggingface.co/datasets/akhooli/afw2_f98_tok) of
6
  FineWeb2 (text length: 250-25000 characters, 98% or more Arabic words) pretokenized.
7
  The actual filtered dataset (text column only) is [here](https://huggingface.co/datasets/akhooli/afw2_f98).
8
- The dataset is a little over 30M records.
 
 
5
  on a filtered [subset](https://huggingface.co/datasets/akhooli/afw2_f98_tok) of
6
  FineWeb2 (text length: 250-25000 characters, 98% or more Arabic words) pretokenized.
7
  The actual filtered dataset (text column only) is [here](https://huggingface.co/datasets/akhooli/afw2_f98).
8
+ The dataset is a little over 30M records.
9
+ The model folder contains a checkpoint (64 batch size on single GPU, 60,000 iterations)