How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("fill-mask", model="sarkerlab/SocBERT-base")
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("sarkerlab/SocBERT-base")
model = AutoModelForMaskedLM.from_pretrained("sarkerlab/SocBERT-base")
Quick Links

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SocBERT model

Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective. The tweets are from Archive and collected from Twitter Streaming API. The Reddit comments are ramdonly sampled from all subreddits from 2015-2019. SocBERT-base was pretrained on 819M sequence blocks for 100K steps. SocBERT-final was pretrained on 929M (819M+110M) sequence blocks for 112K (100K+12K) steps. We benchmarked SocBERT, on 40 text classification tasks with social media data.

The experiment results can be found in our paper:

@inproceedings{socbert:2023,
title     = {{SocBERT: A Pretrained Model for Social Media Text}},
author    = {Yuting Guo and Abeed Sarker},
booktitle = {Proceedings of the Fourth Workshop on Insights from Negative Results in NLP},
year      = {2023}
}
Downloads last month
314
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using sarkerlab/SocBERT-base 1