| --- |
| license: other |
| license_name: mixed-terms |
| license_link: LICENSE |
| language: |
| - en |
| metrics: |
| - f1 |
| - precision |
| - recall |
| base_model: |
| - google/gemma-2-2b-it |
| - answerdotai/ModernBERT-base |
| tags: |
| - Multilabel classification |
| - Propaganda-detection |
| - Text-classification |
| --- |
| license: other |
| tags: |
| - text-classification |
| - multi-label-classification |
| - propaganda-detection |
| --- |
|
|
| ## Propaganda Detector Ensemble (Inference Bundle) |
|
|
| This repository provides an inference bundle for multi-label classification of propaganda techniques. The bundle includes model artifacts and configuration for an ensemble composed of: |
|
|
| - **Gemma**: `google/gemma-2-2b-it` (main) and an **8-bit** inference variant |
| - **ModernBERT**: `answerdotai/ModernBERT-base` (binary / auxiliary classifier component) |
| - **Classical ML**: **LinearSVC + TF-IDF**, optionally with calibration |
| - **Post-processing artifacts**: label list, per-class thresholds, and ensemble metadata |
|
|
| ## Intended Use |
|
|
| - Inference via an API service Modal, browser extension backend, batch scoring, and experimentation. |
| - The repository is intended for **prediction/inference**. Training code and datasets are not included. |
|
|
| ## Licensing & Attribution (IMPORTANT) |
|
|
| This repository uses **mixed licensing/terms** because it bundles multiple upstream components. |
|
|
| ### Gemma components (Gemma-2-2B-IT and derivatives) |
| Gemma-based weights and any derivatives (including fine-tuned and/or quantized variants) are subject to the **Gemma Terms of Use**: |
| https://ai.google.dev/gemma/terms |
|
|
| ### ModernBERT component |
| `answerdotai/ModernBERT-base` is released under the **Apache License 2.0**. If ModernBERT weights or derivatives are included in this repository, their use and distribution are subject to Apache-2.0: |
| https://www.apache.org/licenses/LICENSE-2.0 |
|
|
| ## Dataset attribution (SemEval-2020 Task 11 / PTC) |
|
|
| This work uses the **Propaganda Techniques Corpus (PTC)** from **SemEval 2020 Task 11**. |
|
|
| The dataset was **modified for this project** during preprocessing and label setup. In particular: |
|
|
| - The original span-level annotations were transformed into a **classification-ready format** (instances derived from annotated fragments with document context). |
| - Underrepresented techniques were **merged into super-classes** as in the task setup (e.g., “Bandwagon” + “Reductio ad Hitlerum”, and “Whataboutism” + “Straw Men” + “Red Herring”). |
| - The technique **“Obfuscation, Intentional Vagueness, Confusion”** was **excluded** due to very low frequency (as described in the PTC documentation). |
|
|
| The original dataset is **not redistributed** in this repository. Any modifications are the responsibility of the authors of this repository. |
|
|
| Please cite the following paper when using the PTC corpus: |
|
|
| Da San Martino, G., Yu, S., Barrón-Cedeño, A., Petrov, R., & Nakov, P. (2019). |
| *Fine-Grained Analysis of Propaganda in News Articles* (EMNLP-IJCNLP 2019). |
|
|
| ```bibtex |
| @InProceedings{EMNLP19DaSanMartino, |
| author = {Da San Martino, Giovanni and |
| Yu, Seunghak and |
| Barr\'{o}n-Cede\~no, Alberto and |
| Petrov, Rostislav and |
| Nakov, Preslav}, |
| title = {Fine-Grained Analysis of Propaganda in News Articles}, |
| booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and |
| 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019}, |
| year = {2019} |
| } |
| ``` |
|
|
| ## Summary note |
| If any terms conflict, the most restrictive applicable terms for a given component apply. This repository does not grant any additional rights beyond those stated in the upstream licenses/terms. |
|
|
| ## Limitations |
|
|
| - Predictions can be sensitive to domain shift, language, and text length. |
| - This model may produce false positives/negatives; use as decision support, not as sole authority. |
|
|
| ## Contact / Notes |
|
|
| If you use this repository, please ensure compliance with the upstream licenses/terms and provide appropriate dataset attribution. |
| --- |