Update README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,14 @@
|
|
| 1 |
---
|
| 2 |
-
base_model:
|
| 3 |
library_name: peft
|
| 4 |
license: apache-2.0
|
| 5 |
language:
|
| 6 |
-
-
|
| 7 |
tags:
|
| 8 |
- propaganda
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# Model Card for identrics/
|
| 12 |
|
| 13 |
|
| 14 |
|
|
@@ -16,35 +16,37 @@ tags:
|
|
| 16 |
## Model Description
|
| 17 |
|
| 18 |
- **Developed by:** [`Identrics`](https://identrics.ai/)
|
| 19 |
-
- **Language:**
|
| 20 |
- **License:** apache-2.0
|
| 21 |
-
- **Finetuned from model:** [`
|
| 22 |
- **Context window :** 8192 tokens
|
| 23 |
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
-
This model consists of a fine-tuned version of
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
## Propaganda taxonomy
|
| 31 |
|
| 32 |
-
The propaganda techniques
|
| 33 |
|
| 34 |
-
1. Self-Identification Techniques
|
| 35 |
These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.
|
| 36 |
|
| 37 |
|
| 38 |
-
2. Defamation Techniques
|
| 39 |
These techniques represent direct or indirect attacks against an entity's reputation and worth.
|
| 40 |
|
| 41 |
-
3. Legitimisation Techniques
|
| 42 |
These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.
|
| 43 |
|
| 44 |
-
4. Logical Fallacies
|
| 45 |
These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.
|
| 46 |
|
| 47 |
-
5. Rhetorical Devices
|
| 48 |
These techniques seek to influence the audience and control the conversation by using linguistic methods.
|
| 49 |
|
| 50 |
|
|
@@ -68,8 +70,8 @@ Then the model can be downloaded and used for inference:
|
|
| 68 |
```py
|
| 69 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 70 |
|
| 71 |
-
model = AutoModelForSequenceClassification.from_pretrained("identrics/
|
| 72 |
-
tokenizer = AutoTokenizer.from_pretrained("identrics/
|
| 73 |
|
| 74 |
tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
|
| 75 |
output = model(**tokens)
|
|
@@ -92,14 +94,19 @@ print(output.logits)
|
|
| 92 |
## Training Details
|
| 93 |
|
| 94 |
|
| 95 |
-
During the training stage,
|
| 96 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
|
|
|
|
| 99 |
|
| 100 |
-
|
| 101 |
|
| 102 |
-
## Citation
|
| 103 |
|
| 104 |
If you find our work useful, please consider citing WASPer:
|
| 105 |
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: mistralai/Mistral-7B-v0.1
|
| 3 |
library_name: peft
|
| 4 |
license: apache-2.0
|
| 5 |
language:
|
| 6 |
+
- en
|
| 7 |
tags:
|
| 8 |
- propaganda
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Model Card for identrics/wasper_propaganda_classifier_en
|
| 12 |
|
| 13 |
|
| 14 |
|
|
|
|
| 16 |
## Model Description
|
| 17 |
|
| 18 |
- **Developed by:** [`Identrics`](https://identrics.ai/)
|
| 19 |
+
- **Language:** English
|
| 20 |
- **License:** apache-2.0
|
| 21 |
+
- **Finetuned from model:** [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
| 22 |
- **Context window :** 8192 tokens
|
| 23 |
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
+
This model consists of a fine-tuned version of mistralai/Mistral-7B-v0.1 for a propaganda detection task. It is effectively a multilabel classifier, determining whether a given propaganda text in English contains or not 5 predefined propaganda types.
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
This model was created by [`Identrics`](https://identrics.ai/), in the scope of the WASPer project. The detailed taxonomy of the full pipeline could be found [here](https://github.com/Identrics/wasper/).
|
| 30 |
|
| 31 |
|
| 32 |
## Propaganda taxonomy
|
| 33 |
|
| 34 |
+
The propaganda techniques identifiable with this model are classified into five categories:
|
| 35 |
|
| 36 |
+
1. **Self-Identification Techniques**:
|
| 37 |
These techniques exploit the audience's feelings of association (or desire to be associated) with a larger group. They suggest that the audience should feel united, motivated, or threatened by the same factors that unite, motivate, or threaten that group.
|
| 38 |
|
| 39 |
|
| 40 |
+
2. **Defamation Techniques**:
|
| 41 |
These techniques represent direct or indirect attacks against an entity's reputation and worth.
|
| 42 |
|
| 43 |
+
3. **Legitimisation Techniques**:
|
| 44 |
These techniques attempt to prove and legitimise the propagandist's statements by using arguments that cannot be falsified because they are based on moral values or personal experiences.
|
| 45 |
|
| 46 |
+
4. **Logical Fallacies**:
|
| 47 |
These techniques appeal to the audience's reason and masquerade as objective and factual arguments, but in reality, they exploit distractions and flawed logic.
|
| 48 |
|
| 49 |
+
5. **Rhetorical Devices**:
|
| 50 |
These techniques seek to influence the audience and control the conversation by using linguistic methods.
|
| 51 |
|
| 52 |
|
|
|
|
| 70 |
```py
|
| 71 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 72 |
|
| 73 |
+
model = AutoModelForSequenceClassification.from_pretrained("identrics/wasper_propaganda_classifier_en", num_labels=5)
|
| 74 |
+
tokenizer = AutoTokenizer.from_pretrained("identrics/wasper_propaganda_classifier_en")
|
| 75 |
|
| 76 |
tokens = tokenizer("Our country is the most powerful country in the world!", return_tensors="pt")
|
| 77 |
output = model(**tokens)
|
|
|
|
| 94 |
## Training Details
|
| 95 |
|
| 96 |
|
| 97 |
+
During the training stage, the objective was to develop the multi-label classifier to identify different types of propaganda using a dataset containing both real and artificially generated samples.
|
| 98 |
+
|
| 99 |
+
The data has been carefully annotated by domain experts based on a predefined taxonomy, which covers five primary categories. Some examples are assigned to a single category, while others are classified into multiple categories, reflecting the nuanced nature of propaganda where multiple techniques can be found within a single text.
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
The model reached an F1-weighted score of **0.464** during training.
|
| 103 |
|
| 104 |
|
| 105 |
+
## Compute Infrastructure
|
| 106 |
|
| 107 |
+
This model was fine-tuned using a **GPU / 2xNVIDIA Tesla V100 32GB**.
|
| 108 |
|
| 109 |
+
## Citation [this section is to be updated soon]
|
| 110 |
|
| 111 |
If you find our work useful, please consider citing WASPer:
|
| 112 |
|