Updated README file
#2
by Alonadoli - opened
README.md
CHANGED
|
@@ -1,199 +1,242 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
# Model Card for
|
| 7 |
-
|
| 8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
| 9 |
-
|
| 10 |
|
|
|
|
| 11 |
|
| 12 |
## Model Details
|
| 13 |
|
| 14 |
### Model Description
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 19 |
|
| 20 |
-
- **Developed by:**
|
| 21 |
-
- **
|
| 22 |
-
- **
|
| 23 |
-
- **
|
| 24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
-
- **License:** [More Information Needed]
|
| 26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
|
| 28 |
-
### Model Sources
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
- **Repository:** [More Information Needed]
|
| 33 |
-
- **Paper [optional]:** [More Information Needed]
|
| 34 |
-
- **Demo [optional]:** [More Information Needed]
|
| 35 |
|
| 36 |
## Uses
|
| 37 |
|
| 38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 39 |
-
|
| 40 |
### Direct Use
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
[More Information Needed]
|
| 45 |
|
| 46 |
-
### Downstream Use
|
| 47 |
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
| 51 |
|
| 52 |
### Out-of-Scope Use
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
| 57 |
|
| 58 |
## Bias, Risks, and Limitations
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
### Recommendations
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
|
|
|
|
|
|
| 69 |
|
| 70 |
## How to Get Started with the Model
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
## Training Details
|
| 77 |
|
| 78 |
### Training Data
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
### Training Procedure
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
[More Information Needed]
|
| 91 |
-
|
| 92 |
|
| 93 |
#### Training Hyperparameters
|
| 94 |
-
|
| 95 |
-
- **
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
## Evaluation
|
| 104 |
|
| 105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 106 |
-
|
| 107 |
### Testing Data, Factors & Metrics
|
| 108 |
|
| 109 |
#### Testing Data
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
[More Information Needed]
|
| 114 |
|
| 115 |
#### Factors
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
|
| 121 |
#### Metrics
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
|
|
|
| 126 |
|
| 127 |
### Results
|
| 128 |
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
| 132 |
|
|
|
|
| 133 |
|
|
|
|
| 134 |
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
[More Information Needed]
|
| 140 |
|
| 141 |
## Environmental Impact
|
| 142 |
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 146 |
|
| 147 |
-
- **Hardware Type:**
|
| 148 |
-
- **Hours used:**
|
| 149 |
-
- **Cloud Provider:**
|
| 150 |
-
- **Compute Region:**
|
| 151 |
-
- **Carbon Emitted:**
|
| 152 |
|
| 153 |
-
## Technical Specifications
|
| 154 |
|
| 155 |
### Model Architecture and Objective
|
| 156 |
-
|
| 157 |
-
|
|
|
|
|
|
|
|
|
|
| 158 |
|
| 159 |
### Compute Infrastructure
|
| 160 |
|
| 161 |
-
[More Information Needed]
|
| 162 |
-
|
| 163 |
#### Hardware
|
| 164 |
-
|
| 165 |
-
|
| 166 |
|
| 167 |
#### Software
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
## Citation [optional]
|
| 172 |
|
| 173 |
-
|
| 174 |
|
| 175 |
**BibTeX:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
**APA:**
|
| 180 |
-
|
| 181 |
-
[More Information Needed]
|
| 182 |
-
|
| 183 |
-
## Glossary [optional]
|
| 184 |
-
|
| 185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
-
|
| 187 |
-
[More Information Needed]
|
| 188 |
-
|
| 189 |
-
## More Information [optional]
|
| 190 |
-
|
| 191 |
-
[More Information Needed]
|
| 192 |
-
|
| 193 |
-
## Model Card Authors [optional]
|
| 194 |
|
| 195 |
-
|
| 196 |
|
| 197 |
## Model Card Contact
|
| 198 |
|
| 199 |
-
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- stance-detection
|
| 5 |
+
- political-science
|
| 6 |
+
- multilingual
|
| 7 |
+
- nli
|
| 8 |
+
- deberta
|
| 9 |
+
- group-appeals
|
| 10 |
+
language:
|
| 11 |
+
- en
|
| 12 |
+
- de
|
| 13 |
+
base_model: MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# Model Card for mDeBERTa Stance Detection (No Context)
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
A multilingual stance detection model fine-tuned for detecting political stance towards specific groups in text without contextual information.
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
|
| 22 |
### Model Description
|
| 23 |
|
| 24 |
+
This model is a fine-tuned mDeBERTa-v3-base that performs stance classification using Natural Language Inference (NLI) to determine whether political text expresses positive, negative, or neutral stance towards specific target groups. The model processes focal sentences without additional context.
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
- **Developed by:** Will Horne, Alona O. Dolinsky and Lena Maria Huber
|
| 27 |
+
- **Model type:** Sequence Classification (NLI-based stance detection)
|
| 28 |
+
- **Language(s) (NLP):** English, German (multilingual)
|
| 29 |
+
- **Finetuned from model:** MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
### Model Sources
|
| 32 |
|
| 33 |
+
- **Repository:** rwillh11/mdeberta_NLI_stance_NoContext
|
| 34 |
+
- **Base Model:** [MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7)
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## Uses
|
| 37 |
|
|
|
|
|
|
|
| 38 |
### Direct Use
|
| 39 |
|
| 40 |
+
The model is designed for researchers analyzing political discourse and stance towards specific groups in political text, trained and validated using party manifestos. It takes a natural sentence and a target group as input and classifies the stance as positive, negative, or neutral.
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
### Downstream Use
|
| 43 |
|
| 44 |
+
This model can be integrated into larger political text analysis pipelines for:
|
| 45 |
+
- Political manifestos analysis
|
| 46 |
+
- Group appeals detection in political communication
|
| 47 |
+
- Comparative political research across countries and languages
|
| 48 |
|
| 49 |
### Out-of-Scope Use
|
| 50 |
|
| 51 |
+
This model should not be used for:
|
| 52 |
+
- General sentiment analysis (not group-specific)
|
| 53 |
+
- Real-time social media monitoring without human oversight
|
| 54 |
+
- Making decisions about individuals or groups
|
| 55 |
+
- Content moderation without additional validation
|
| 56 |
|
| 57 |
## Bias, Risks, and Limitations
|
| 58 |
|
| 59 |
+
### Technical Limitations
|
| 60 |
+
- Trained specifically on political manifesto text; performance may vary on other text types
|
| 61 |
+
- Focus sentences without context may lack nuance present in full paragraphs
|
| 62 |
+
- Limited to three stance categories (positive, negative, neutral)
|
| 63 |
|
| 64 |
+
### Bias Considerations
|
| 65 |
+
- Training data consists of political manifestos from specific countries and time periods
|
| 66 |
+
- May reflect biases present in political discourse of training data
|
| 67 |
+
- Group detection and stance classification may vary across different political contexts
|
| 68 |
|
| 69 |
### Recommendations
|
| 70 |
|
| 71 |
+
Users should be aware that this model:
|
| 72 |
+
- Is designed for research purposes in political science
|
| 73 |
+
- Should be validated on specific domains before deployment
|
| 74 |
+
- May require human oversight for sensitive applications
|
| 75 |
+
- Performance may vary across different types of groups and political contexts
|
| 76 |
|
| 77 |
## How to Get Started with the Model
|
| 78 |
|
| 79 |
+
```python
|
| 80 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 81 |
+
import torch
|
| 82 |
+
|
| 83 |
+
# Load model and tokenizer
|
| 84 |
+
model_name = "rwillh11/mdeberta_NLI_stance_NoContext"
|
| 85 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 86 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| 87 |
+
|
| 88 |
+
# Example usage
|
| 89 |
+
text = "We will increase funding for schools to better support students."
|
| 90 |
+
target_group = "students"
|
| 91 |
+
|
| 92 |
+
# Create hypotheses for each stance
|
| 93 |
+
hypotheses = {
|
| 94 |
+
"positive": f"The text is positive towards {target_group}.",
|
| 95 |
+
"negative": f"The text is negative towards {target_group}.",
|
| 96 |
+
"neutral": f"The text is neutral, or contains no stance, towards {target_group}."
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
# Get predictions for each hypothesis
|
| 100 |
+
results = {}
|
| 101 |
+
for stance, hypothesis in hypotheses.items():
|
| 102 |
+
inputs = tokenizer(text, hypothesis, return_tensors="pt", truncation=True)
|
| 103 |
+
with torch.no_grad():
|
| 104 |
+
outputs = model(**inputs)
|
| 105 |
+
probs = torch.softmax(outputs.logits, dim=-1)
|
| 106 |
+
entailment_prob = probs[0][0].item() # Probability of entailment
|
| 107 |
+
results[stance] = entailment_prob
|
| 108 |
+
|
| 109 |
+
# Select stance with highest entailment probability
|
| 110 |
+
predicted_stance = max(results, key=results.get)
|
| 111 |
+
print(f"Predicted stance towards '{target_group}': {predicted_stance}")
|
| 112 |
+
```
|
| 113 |
|
| 114 |
## Training Details
|
| 115 |
|
| 116 |
### Training Data
|
| 117 |
|
| 118 |
+
The model was trained on political manifesto data containing:
|
| 119 |
+
- **Languages:** English and German
|
| 120 |
+
- **Text Type:** Political manifesto sentences at natural sentence level
|
| 121 |
+
- **Labels:** Three-class stance classification (positive, negative, neutral)
|
| 122 |
+
- **Groups:** Various political target groups (citizens, specific demographics, etc.)
|
| 123 |
+
- **Original dataset:** 7,567 text-group pairs
|
| 124 |
+
- **Training Size:** 12,104 expanded training examples (~6,054 original texts × 2 hypotheses each)
|
| 125 |
+
- **Test Size:** 4,539 expanded test examples (~1,513 original texts × 3 hypotheses each)
|
| 126 |
|
| 127 |
### Training Procedure
|
| 128 |
|
| 129 |
+
#### Preprocessing
|
| 130 |
+
- Texts tokenized using mDeBERTa tokenizer with max length 512
|
| 131 |
+
- NLI format: premise (political text) + hypothesis (stance towards group)
|
| 132 |
+
- Each text paired with both true and false hypotheses for binary classification
|
|
|
|
|
|
|
| 133 |
|
| 134 |
#### Training Hyperparameters
|
| 135 |
+
- **Training regime:** Mixed precision training
|
| 136 |
+
- **Optimizer:** AdamW with weight decay
|
| 137 |
+
- **Learning rate:** Optimized via Optuna (range: 1e-5 to 4e-5)
|
| 138 |
+
- **Weight decay:** Optimized via Optuna (range: 0.01 to 0.3)
|
| 139 |
+
- **Warmup ratio:** Optimized via Optuna (range: 0.0 to 0.1)
|
| 140 |
+
- **Epochs:** 10 per trial
|
| 141 |
+
- **Batch size:** 16 (train and eval)
|
| 142 |
+
- **Trials:** 20 total (10 + 10 batches)
|
| 143 |
+
- **Metric for selection:** F1 Macro
|
| 144 |
+
- **Seed:** 42 (deterministic training)
|
| 145 |
+
|
| 146 |
+
#### Training Infrastructure
|
| 147 |
+
- **Hardware:** CUDA-enabled GPU
|
| 148 |
+
- **Framework:** Transformers, PyTorch
|
| 149 |
+
- **Hyperparameter optimization:** Optuna
|
| 150 |
+
- **Deterministic training:** All random seeds fixed
|
| 151 |
|
| 152 |
## Evaluation
|
| 153 |
|
|
|
|
|
|
|
| 154 |
### Testing Data, Factors & Metrics
|
| 155 |
|
| 156 |
#### Testing Data
|
| 157 |
+
- 20% holdout from original dataset
|
| 158 |
+
- Multilingual political manifesto sentences
|
| 159 |
+
- Balanced across stance classes and languages
|
|
|
|
| 160 |
|
| 161 |
#### Factors
|
| 162 |
+
The model was evaluated across:
|
| 163 |
+
- **Languages:** English and German text
|
| 164 |
+
- **Stance classes:** Positive, negative, neutral
|
| 165 |
+
- **Group types:** Various political target groups
|
| 166 |
|
| 167 |
#### Metrics
|
| 168 |
+
Primary metrics used for evaluation:
|
| 169 |
+
- **F1 Macro:** Primary optimization metric (treats all classes equally)
|
| 170 |
+
- **F1 Micro:** Overall classification accuracy
|
| 171 |
+
- **Balanced Accuracy:** Accounts for class imbalance
|
| 172 |
+
- **Precision/Recall (Macro & Micro):** Detailed performance measures
|
| 173 |
|
| 174 |
### Results
|
| 175 |
|
| 176 |
+
**Best Model Performance (Trial 19):**
|
| 177 |
+
- **F1 Macro:** ~0.80 (varies by epoch)
|
| 178 |
+
- **F1 Micro:** ~0.84
|
| 179 |
+
- **Accuracy:** ~0.84
|
| 180 |
+
- **Balanced Accuracy:** ~0.79
|
| 181 |
|
| 182 |
+
The model demonstrates strong performance across stance categories with deterministic results confirmed through multiple prediction runs.
|
| 183 |
|
| 184 |
+
## Model Examination
|
| 185 |
|
| 186 |
+
The model uses Natural Language Inference to transform stance detection into a binary entailment task:
|
| 187 |
+
- For each text-group pair, generates three hypotheses (positive/negative/neutral stance)
|
| 188 |
+
- Selects the hypothesis with highest entailment probability
|
| 189 |
+
- This approach leverages pre-trained NLI capabilities for stance classification
|
|
|
|
| 190 |
|
| 191 |
## Environmental Impact
|
| 192 |
|
| 193 |
+
Training involved hyperparameter optimization with 20 trials, each training for 10 epochs.
|
|
|
|
|
|
|
| 194 |
|
| 195 |
+
- **Hardware Type:** CUDA-enabled GPU
|
| 196 |
+
- **Hours used:** Estimated 10-15 hours (including hyperparameter search)
|
| 197 |
+
- **Cloud Provider:** Google Colab
|
| 198 |
+
- **Compute Region:** Variable
|
| 199 |
+
- **Carbon Emitted:** Not precisely measured
|
| 200 |
|
| 201 |
+
## Technical Specifications
|
| 202 |
|
| 203 |
### Model Architecture and Objective
|
| 204 |
+
- **Base Architecture:** mDeBERTa-v3-base (278M parameters)
|
| 205 |
+
- **Task:** Natural Language Inference for stance detection
|
| 206 |
+
- **Input:** Text pair (political sentence + stance hypothesis)
|
| 207 |
+
- **Output:** Binary classification (entailment/non-entailment)
|
| 208 |
+
- **Objective:** Cross-entropy loss with F1 Macro optimization
|
| 209 |
|
| 210 |
### Compute Infrastructure
|
| 211 |
|
|
|
|
|
|
|
| 212 |
#### Hardware
|
| 213 |
+
- GPU-accelerated training (CUDA)
|
| 214 |
+
- Mixed precision training support
|
| 215 |
|
| 216 |
#### Software
|
| 217 |
+
- Transformers library
|
| 218 |
+
- PyTorch framework
|
| 219 |
+
- Optuna for hyperparameter optimization
|
| 220 |
+
- scikit-learn for metrics
|
| 221 |
|
| 222 |
+
## Citation
|
|
|
|
|
|
|
| 223 |
|
| 224 |
+
If you use this model in your research, please cite:
|
| 225 |
|
| 226 |
**BibTeX:**
|
| 227 |
+
```bibtex
|
| 228 |
+
@misc{mdeberta_stance_nocontext,
|
| 229 |
+
title={mDeBERTa Stance Detection Model for Political Group Appeals},
|
| 230 |
+
author={Research Team},
|
| 231 |
+
year={2024},
|
| 232 |
+
url={https://huggingface.co/rwillh11/mdeberta_NLI_stance_NoContext}
|
| 233 |
+
}
|
| 234 |
+
```
|
| 235 |
|
| 236 |
+
## Model Card Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 237 |
|
| 238 |
+
Research team studying group appeals in political discourse.
|
| 239 |
|
| 240 |
## Model Card Contact
|
| 241 |
|
| 242 |
+
For questions about this model, please open an issue in the repository or contact the research team through appropriate academic channels.
|