Alonadoli commited on
Commit
6013d01
·
verified ·
1 Parent(s): d15268e

README.md

Browse files

Update to the README.md file

Files changed (1) hide show
  1. README.md +163 -118
README.md CHANGED
@@ -1,199 +1,244 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
 
 
51
 
52
  ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
 
 
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
61
 
62
- [More Information Needed]
 
 
 
63
 
64
  ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ## Training Details
77
 
78
  ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
 
 
 
 
83
 
84
  ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
 
93
  #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
 
 
 
 
 
 
 
 
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
  ### Testing Data, Factors & Metrics
108
 
109
  #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
 
115
  #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
 
121
  #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
 
126
 
127
  ### Results
128
 
129
- [More Information Needed]
130
-
131
- #### Summary
 
 
132
 
 
133
 
 
134
 
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
 
141
  ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
  ### Model Architecture and Objective
156
-
157
- [More Information Needed]
 
 
 
158
 
159
  ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
-
163
  #### Hardware
164
-
165
- [More Information Needed]
166
 
167
  #### Software
 
 
 
 
168
 
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
  **BibTeX:**
 
 
 
 
 
 
 
 
176
 
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - stance-detection
5
+ - political-science
6
+ - multilingual
7
+ - nli
8
+ - deberta
9
+ - group-appeals
10
+ language:
11
+ - en
12
+ - de
13
+ license: mit
14
+ base_model: MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
15
  ---
16
 
17
+ # Model Card for mDeBERTa Stance Detection (No Context)
 
 
 
18
 
19
+ A multilingual stance detection model fine-tuned for detecting political stance towards specific groups in text without contextual information.
20
 
21
  ## Model Details
22
 
23
  ### Model Description
24
 
25
+ This model is a fine-tuned mDeBERTa-v3-base that performs stance classification using Natural Language Inference (NLI) to determine whether political text expresses positive, negative, or neutral stance towards specific target groups. The model processes focal sentences without additional context.
 
 
26
 
27
+ - **Developed by:** Research team studying group appeals in political discourse
28
+ - **Model type:** Sequence Classification (NLI-based stance detection)
29
+ - **Language(s) (NLP):** English, German (multilingual)
30
+ - **License:** MIT
31
+ - **Finetuned from model:** MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
 
 
32
 
33
+ ### Model Sources
34
 
35
+ - **Repository:** rwillh11/mdeberta_NLI_stance_NoContext
36
+ - **Base Model:** [MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7)
 
 
 
37
 
38
  ## Uses
39
 
 
 
40
  ### Direct Use
41
 
42
+ The model is designed for researchers analyzing political discourse and stance towards specific groups in manifestos, speeches, and policy documents. It takes a text and a target group as input and classifies the stance as positive, negative, or neutral.
 
 
43
 
44
+ ### Downstream Use
45
 
46
+ This model can be integrated into larger political text analysis pipelines for:
47
+ - Political manifestos analysis
48
+ - Group appeals detection in political communication
49
+ - Comparative political research across countries and languages
50
+ - Policy stance analysis
51
 
52
  ### Out-of-Scope Use
53
 
54
+ This model should not be used for:
55
+ - General sentiment analysis (not group-specific)
56
+ - Real-time social media monitoring without human oversight
57
+ - Making decisions about individuals or groups
58
+ - Content moderation without additional validation
59
 
60
  ## Bias, Risks, and Limitations
61
 
62
+ ### Technical Limitations
63
+ - Trained specifically on political manifesto text; performance may vary on other text types
64
+ - Focus sentences without context may lack nuance present in full paragraphs
65
+ - Limited to three stance categories (positive, negative, neutral)
66
 
67
+ ### Bias Considerations
68
+ - Training data consists of political manifestos from specific countries and time periods
69
+ - May reflect biases present in political discourse of training data
70
+ - Group detection and stance classification may vary across different political contexts
71
 
72
  ### Recommendations
73
 
74
+ Users should be aware that this model:
75
+ - Is designed for research purposes in political science
76
+ - Should be validated on specific domains before deployment
77
+ - May require human oversight for sensitive applications
78
+ - Performance may vary across different types of groups and political contexts
79
 
80
  ## How to Get Started with the Model
81
 
82
+ ```python
83
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
84
+ import torch
85
+
86
+ # Load model and tokenizer
87
+ model_name = "rwillh11/mdeberta_NLI_stance_NoContext"
88
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
89
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
90
+
91
+ # Example usage
92
+ text = "We will increase funding for schools to better support students."
93
+ target_group = "students"
94
+
95
+ # Create hypotheses for each stance
96
+ hypotheses = {
97
+ "positive": f"The text is positive towards {target_group}.",
98
+ "negative": f"The text is negative towards {target_group}.",
99
+ "neutral": f"The text is neutral, or contains no stance, towards {target_group}."
100
+ }
101
+
102
+ # Get predictions for each hypothesis
103
+ results = {}
104
+ for stance, hypothesis in hypotheses.items():
105
+ inputs = tokenizer(text, hypothesis, return_tensors="pt", truncation=True)
106
+ with torch.no_grad():
107
+ outputs = model(**inputs)
108
+ probs = torch.softmax(outputs.logits, dim=-1)
109
+ entailment_prob = probs[0][0].item() # Probability of entailment
110
+ results[stance] = entailment_prob
111
+
112
+ # Select stance with highest entailment probability
113
+ predicted_stance = max(results, key=results.get)
114
+ print(f"Predicted stance towards '{target_group}': {predicted_stance}")
115
+ ```
116
 
117
  ## Training Details
118
 
119
  ### Training Data
120
 
121
+ The model was trained on political manifesto data containing:
122
+ - **Languages:** English and German
123
+ - **Text Type:** Political manifesto sentences (focal sentences without context)
124
+ - **Labels:** Three-class stance classification (positive, negative, neutral)
125
+ - **Groups:** Various political target groups (citizens, specific demographics, etc.)
126
+ - **Training Size:** ~12,104 expanded training examples (6,052 original texts × 2 hypotheses)
127
+ - **Test Size:** ~4,539 expanded test examples
128
 
129
  ### Training Procedure
130
 
131
+ #### Preprocessing
132
+ - Texts tokenized using mDeBERTa tokenizer with max length 512
133
+ - NLI format: premise (political text) + hypothesis (stance towards group)
134
+ - Each text paired with both true and false hypotheses for binary classification
 
 
135
 
136
  #### Training Hyperparameters
137
+ - **Training regime:** Mixed precision training
138
+ - **Optimizer:** AdamW with weight decay
139
+ - **Learning rate:** Optimized via Optuna (range: 1e-5 to 4e-5)
140
+ - **Weight decay:** Optimized via Optuna (range: 0.01 to 0.3)
141
+ - **Warmup ratio:** Optimized via Optuna (range: 0.0 to 0.1)
142
+ - **Epochs:** 10 per trial
143
+ - **Batch size:** 16 (train and eval)
144
+ - **Trials:** 20 total (10 + 10 batches)
145
+ - **Metric for selection:** F1 Macro
146
+ - **Seed:** 42 (deterministic training)
147
+
148
+ #### Training Infrastructure
149
+ - **Hardware:** CUDA-enabled GPU
150
+ - **Framework:** Transformers, PyTorch
151
+ - **Hyperparameter optimization:** Optuna
152
+ - **Deterministic training:** All random seeds fixed
153
 
154
  ## Evaluation
155
 
 
 
156
  ### Testing Data, Factors & Metrics
157
 
158
  #### Testing Data
159
+ - 20% holdout from original dataset
160
+ - Multilingual political manifesto sentences
161
+ - Balanced across stance classes and languages
 
162
 
163
  #### Factors
164
+ The model was evaluated across:
165
+ - **Languages:** English and German text
166
+ - **Stance classes:** Positive, negative, neutral
167
+ - **Group types:** Various political target groups
168
 
169
  #### Metrics
170
+ Primary metrics used for evaluation:
171
+ - **F1 Macro:** Primary optimization metric (treats all classes equally)
172
+ - **F1 Micro:** Overall classification accuracy
173
+ - **Balanced Accuracy:** Accounts for class imbalance
174
+ - **Precision/Recall (Macro & Micro):** Detailed performance measures
175
 
176
  ### Results
177
 
178
+ **Best Model Performance (Trial 19):**
179
+ - **F1 Macro:** ~0.80 (varies by epoch)
180
+ - **F1 Micro:** ~0.84
181
+ - **Accuracy:** ~0.84
182
+ - **Balanced Accuracy:** ~0.79
183
 
184
+ The model demonstrates strong performance across stance categories with deterministic results confirmed through multiple prediction runs.
185
 
186
+ ## Model Examination
187
 
188
+ The model uses Natural Language Inference to transform stance detection into a binary entailment task:
189
+ - For each text-group pair, generates three hypotheses (positive/negative/neutral stance)
190
+ - Selects the hypothesis with highest entailment probability
191
+ - This approach leverages pre-trained NLI capabilities for stance classification
 
192
 
193
  ## Environmental Impact
194
 
195
+ Training involved hyperparameter optimization with 20 trials, each training for 10 epochs.
 
 
196
 
197
+ - **Hardware Type:** CUDA-enabled GPU
198
+ - **Hours used:** Estimated 10-15 hours (including hyperparameter search)
199
+ - **Cloud Provider:** Google Colab
200
+ - **Compute Region:** Variable
201
+ - **Carbon Emitted:** Not precisely measured
202
 
203
+ ## Technical Specifications
204
 
205
  ### Model Architecture and Objective
206
+ - **Base Architecture:** mDeBERTa-v3-base (278M parameters)
207
+ - **Task:** Natural Language Inference for stance detection
208
+ - **Input:** Text pair (political sentence + stance hypothesis)
209
+ - **Output:** Binary classification (entailment/non-entailment)
210
+ - **Objective:** Cross-entropy loss with F1 Macro optimization
211
 
212
  ### Compute Infrastructure
213
 
 
 
214
  #### Hardware
215
+ - GPU-accelerated training (CUDA)
216
+ - Mixed precision training support
217
 
218
  #### Software
219
+ - Transformers library
220
+ - PyTorch framework
221
+ - Optuna for hyperparameter optimization
222
+ - scikit-learn for metrics
223
 
224
+ ## Citation
 
 
225
 
226
+ If you use this model in your research, please cite:
227
 
228
  **BibTeX:**
229
+ ```bibtex
230
+ @misc{mdeberta_stance_nocontext,
231
+ title={mDeBERTa Stance Detection Model for Political Group Appeals},
232
+ author={Research Team},
233
+ year={2024},
234
+ url={https://huggingface.co/rwillh11/mdeberta_NLI_stance_NoContext}
235
+ }
236
+ ```
237
 
238
+ ## Model Card Authors
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
 
240
+ Research team studying group appeals in political discourse.
241
 
242
  ## Model Card Contact
243
 
244
+ For questions about this model, please open an issue in the repository or contact the research team through appropriate academic channels.