File size: 7,100 Bytes
de7d5da
f034e44
 
 
 
 
 
 
de7d5da
 
 
 
 
 
 
 
 
f034e44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
796acd9
 
 
 
 
 
 
 
 
04b7826
f034e44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
library_name: transformers
tags:
- manipulation
- ukraine
- russia
- telegram
- multi-label
license: apache-2.0
language:
- uk
- ru
metrics:
- f1
base_model:
- FacebookAI/xlm-roberta-large
pipeline_tag: text-classification
---

# Ukrainian/Russian Manipulation Detector - XLM-RoBERTa

## Model Description

This model detects propaganda and manipulation techniques in **Ukrainian and Russian** text. It is a fine-tuned version of [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) trained on a bilingual subset of the UNLP 2025 Shared Task dataset for multi-label classification of manipulation techniques.

Its multilingual architecture makes it effective at understanding nuances in both Ukrainian and Russian, including code-mixed contexts.

## Task: Manipulation Technique Classification

The model performs multi-label text classification, identifying 5 major manipulation categories. A single text can contain multiple techniques.

### Manipulation Categories
1. Loaded Language: The use of words and phrases with a strong emotional connotation (positive or negative) to influence the audience.
2. Glittering Generalities: Exploitation of people's positive attitude towards abstract concepts such as “justice,” “freedom,” “democracy,” “patriotism,” “peace,” “happiness,” “love,” “truth,” “order,” etc. These words and phrases are intended to provoke strong emotional reactions and feelings of solidarity without providing specific information or arguments.
3. Euphoria: Using an event that causes euphoria or a feeling of happiness, or a positive event to boost morale. This manipulation is often used to mobilize the population.
4. Appeal to Fear: The misuse of fear (often based on stereotypes or prejudices) to support a particular proposal.
5. FUD (Fear, Uncertainty, Doubt): Presenting information in a way that sows uncertainty and doubt, causing fear. This technique is a subtype of the appeal to fear.
6. Bandwagon/Appeal to People: An attempt to persuade the audience to join and take action because “others are doing the same thing.”
7. Thought-Terminating Cliché: Commonly used phrases that mitigate cognitive dissonance and block critical thinking.
8. Whataboutism: Discrediting the opponent's position by accusing them of hypocrisy without directly refuting their arguments.
9. Cherry Picking: Selective use of data or facts that support a hypothesis while ignoring counterarguments.
10. Straw Man: Distorting the opponent's position by replacing it with a weaker or outwardly similar one and refuting it instead.

## Training Data

The model was trained on the dataset from the UNLP 2025 Shared Task on manipulation technique classification.

* **Dataset:** [UNLP 2025 Techniques Classification](https://github.com/unlp-workshop/unlp-2025-shared-task/tree/main/data/techniques_classification)
* **Source Texts:** Ukrainian and Russian texts from a larger multilingual dataset.
* **Task:** Multi-label classification.

## Training Configuration

The model was fine-tuned using the following hyperparameters:

| Parameter | Value |
| :--- | :--- |
| **Base Model** | `FacebookAI/xlm-roberta-large` |
| **Learning Rate** | `2e-5` |
| **Train Batch Size** | `16` |
| **Eval Batch Size**| `32` |
| **Epochs** | `10` |
| **Max Sequence Length** | `512` |
| **Optimizer** | AdamW |
| **Loss Function** | `BCEWithLogitsLoss` (with class weights) |

## Usage

### Installation

First, install the necessary libraries:
```bash
pip install transformers torch sentencepiece
```

### Quick Start

Here is how to use the model to classify a single piece of text:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Define model and label names
model_name = "olehmell/ukr-rus-manipulation-detector-xlm-roberta" # Hypothetical model name
labels = [
    'emotional_manipulation', 
    'fear_appeals', 
    'bandwagon_effect', 
    'selective_truth', 
    'cliche'
]

# Load pretrained model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prepare text (can be Ukrainian or Russian)
text = "Все эксперты уже давно это подтвердили, только вы не понимаете, что происходит на самом деле."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Get detected techniques
threshold = 0.5
detected_techniques = {}
for i, score in enumerate(predictions[0]):
    if score > threshold:
        detected_techniques[labels[i]] = f"{score:.2f}"

if detected_techniques:
    print("Detected techniques:")
    for technique, score in detected_techniques.items():
        print(f"- {technique} (Score: {score})")
else:
    print("No manipulation techniques detected.")

```

## Performance

The model achieves the following performance on the evaluation set:

| Metric | Value |
| :--- | :--- |
| **F1 Macro** | **0.44** |
| F1 Micro | TBD |
| Hamming Loss | TBD |

## Limitations

* **Language Specificity:** The model is optimized for Ukrainian and Russian. Performance on other languages is not guaranteed.
* **Domain Sensitivity:** Trained primarily on political and social media discourse, its performance may vary on other text domains (e.g., scientific, literary).
* **Context Length:** The model is limited to texts up to 512 tokens. Longer documents must be chunked or truncated.
* **Class Imbalance:** Some manipulation techniques are underrepresented in the training data, which may affect their detection accuracy.

## Ethical Considerations

* **Purpose:** This model is intended as a tool to support media literacy and critical thinking, not as an arbiter of truth.
* **Human Oversight:** Model outputs should be interpreted with human judgment and a full understanding of the context. It should not be used to automatically censor content.
* **Potential Biases:** The model may reflect biases present in the training data.

## Citation

If you use this model in your research, please cite the following:

```bibtex
@misc{ukrainian-russian-manipulation-xlm-roberta-2025,
  author = {Oleh Mell},
  title = {Ukrainian/Russian Manipulation Detector - XLM-RoBERTa},
  year = {2025},
  publisher = {Hugging Face},
  url = {[https://huggingface.co/olehmell/ukr-rus-manipulation-detector-xlm-roberta](https://huggingface.co/olehmell/ukr-rus-manipulation-detector-xlm-roberta)}
}
```
```bibtex
@inproceedings{unlp2025shared,
  title={UNLP 2025 Shared Task on Techniques Classification},
  author={UNLP Workshop Organizers},
  booktitle={UNLP 2025 Workshop},
  year={2025},
  url={[https://github.com/unlp-workshop/unlp-2025-shared-task](https://github.com/unlp-workshop/unlp-2025-shared-task)}
}
```

## License

This model is licensed under the **Apache 2.0 License**.

## Acknowledgments

* The organizers of the UNLP 2025 Workshop for providing the dataset.