Model upload from WebSci'25 paper

Browse files

Files changed (7) hide show

README.md +84 -0
config.json +25 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +59 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+language: en
+license: apache-2.0
+library_name: transformers
+tags:
+- text-classification
+- personal-narrative
+- political-discourse
+- computational-social-science
+- websci25
+datasets:
+- custom-reddit-dataset
+base_model: falkne/storytelling-LM-europarl-mixed-en
+---
+# Personal Narrative Classifier (WebSci'25)
+This is the official repository for the text classification model presented in the paper: **"Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions"**, which received a Best Paper Honorable Mention at the 17th ACM Web Science Conference (WebSci'25).
+The model is a fine-tuned BERT-based classifier (`falkne/storytelling-LM-europarl-mixed-en`) designed to identify personal narratives in online comments.
+## Model Description
+This model classifies a given text as either a "Personal Narrative" or "Not a Personal Narrative". It was developed to support a large-scale computational analysis of how personal stories affect engagement in online political discussions on Reddit.
+- **Label 0**: Not a Personal Narrative
+- **Label 1**: Personal Narrative
+## Intended Uses & Limitations
+### Intended Use
+This model is intended for researchers in computational social science, political science, communication, and HCI to study online discourse. It can be used to:
+- Quantify the use of personal narratives in various online communities.
+- Analyze the reception and impact of story-based arguments.
+- Replicate and extend the findings of the original paper.
+### Limitations
+As noted in the paper, this model has several limitations:
+- The training and evaluation data comes from political subreddits on Reddit from 2020-2021. Its performance may vary on other platforms or time periods.
+- The definition of "political activity" was based on subreddit engagement, which may not capture all forms of political interest.
+- The model does not analyze the content or veracity of the narratives. Personal narratives can also be used to spread misinformation, which is an avenue for future research.
+## How to Use
+You can use this model with the `transformers` library pipeline for easy inference.
+```python
+from transformers import pipeline
+repo_id = "tejasvichebrolu/personal-narrative-classifier"
+classifier = pipeline("text-classification", model=repo_id)
+# Example texts
+narrative_text = "I’m in Alabama and oh my god it was so humid yesterday. I was so unproductive from how bad it was."
+non_narrative_text = "The most straightforward solution is to encourage others to engage with politics online."
+# Get predictions
+results = classifier([narrative_text, non_narrative_text])
+for text, result in zip([narrative_text, non_narrative_text], results):
+    print(f"Text: {text}")
+    # The pipeline may return LABEL_0/LABEL_1 or the names from the config
+    print(f"  -> Prediction: {result['label']}, Score: {result['score']:.4f}\n")
+```
+## Training and Evaluation
+The model was fine-tuned on a dataset of 2,000 manually labeled Reddit comments. It achieved a macro average F1-score of **0.82** in 5-fold cross-validation. For more details on the training procedure and performance, please refer to the paper.
+## Citation
+If you use this model or its findings in your research, please cite our paper:
+```bibtex
+@inproceedings{chebrolu2025narratives,
+  title={{Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions}},
+  author={{Chebrolu, Tejasvi and Kumaraguru, Ponnurangam and Rajadesingan, Ashwin}},
+  booktitle={{Proceedings of the 17th ACM Web Science Conference 2025 (Websci '25)}},
+  year={{2025}},
+  organization={{ACM}},
+  doi={10.1145/3717867.3717899}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "falkne/storytelling-LM-europarl-mixed-en",
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.40.0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5bf71006f45964ae0db325dc0d52dcc028f0f12d4ec246c40657f395aaf45ab
+size 437958648

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "config": "./tokenizer_config.json",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_len": 512,
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff