sarkerlab
/

SocBERT-base

Model card Files Files and versions

yguo262 commited on Mar 21, 2023

Commit

e00ef16

·

1 Parent(s): 263c3c1

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -1,7 +1,11 @@
 # SocBERT model
 Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
-The model was trained from scratch following the model architecture of RoBERTa-base.
 We benchmarked SocBERT, on 40 text classification tasks with social media data.
 The experiment results can be found in our paper:
 ```
 @inproceedings{socbert:2023,

 # SocBERT model
 Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
+The tweets are from Archive and collected from Twitter Streaming API.
+The Reddit comments are ramdonly sampled from all subreddits from 2015-2019.
+SocBERT-base was pretrained on 819M sequence blocks for 100K steps.
+SocBERT-final was pretrained on 929M (819M+110M) sequence blocks for 112K (100K+12K) steps.
 We benchmarked SocBERT, on 40 text classification tasks with social media data.
 The experiment results can be found in our paper:
 ```
 @inproceedings{socbert:2023,