Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,11 @@
|
|
| 1 |
# SocBERT model
|
| 2 |
Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
|
| 3 |
-
The
|
|
|
|
|
|
|
|
|
|
| 4 |
We benchmarked SocBERT, on 40 text classification tasks with social media data.
|
|
|
|
| 5 |
The experiment results can be found in our paper:
|
| 6 |
```
|
| 7 |
@inproceedings{socbert:2023,
|
|
|
|
| 1 |
# SocBERT model
|
| 2 |
Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
|
| 3 |
+
The tweets are from Archive and collected from Twitter Streaming API.
|
| 4 |
+
The Reddit comments are ramdonly sampled from all subreddits from 2015-2019.
|
| 5 |
+
SocBERT-base was pretrained on 819M sequence blocks for 100K steps.
|
| 6 |
+
SocBERT-final was pretrained on 929M (819M+110M) sequence blocks for 112K (100K+12K) steps.
|
| 7 |
We benchmarked SocBERT, on 40 text classification tasks with social media data.
|
| 8 |
+
|
| 9 |
The experiment results can be found in our paper:
|
| 10 |
```
|
| 11 |
@inproceedings{socbert:2023,
|