--- license: other license_name: mixed-terms license_link: LICENSE language: - en metrics: - f1 - precision - recall base_model: - google/gemma-2-2b-it - answerdotai/ModernBERT-base tags: - Multilabel classification - Propaganda-detection - Text-classification --- license: other tags: - text-classification - multi-label-classification - propaganda-detection --- ## Propaganda Detector Ensemble (Inference Bundle) This repository provides an inference bundle for multi-label classification of propaganda techniques. The bundle includes model artifacts and configuration for an ensemble composed of: - **Gemma**: `google/gemma-2-2b-it` (main) and an **8-bit** inference variant - **ModernBERT**: `answerdotai/ModernBERT-base` (binary / auxiliary classifier component) - **Classical ML**: **LinearSVC + TF-IDF**, optionally with calibration - **Post-processing artifacts**: label list, per-class thresholds, and ensemble metadata ## Intended Use - Inference via an API service Modal, browser extension backend, batch scoring, and experimentation. - The repository is intended for **prediction/inference**. Training code and datasets are not included. ## Licensing & Attribution (IMPORTANT) This repository uses **mixed licensing/terms** because it bundles multiple upstream components. ### Gemma components (Gemma-2-2B-IT and derivatives) Gemma-based weights and any derivatives (including fine-tuned and/or quantized variants) are subject to the **Gemma Terms of Use**: https://ai.google.dev/gemma/terms ### ModernBERT component `answerdotai/ModernBERT-base` is released under the **Apache License 2.0**. If ModernBERT weights or derivatives are included in this repository, their use and distribution are subject to Apache-2.0: https://www.apache.org/licenses/LICENSE-2.0 ## Dataset attribution (SemEval-2020 Task 11 / PTC) This work uses the **Propaganda Techniques Corpus (PTC)** from **SemEval 2020 Task 11**. The dataset was **modified for this project** during preprocessing and label setup. In particular: - The original span-level annotations were transformed into a **classification-ready format** (instances derived from annotated fragments with document context). - Underrepresented techniques were **merged into super-classes** as in the task setup (e.g., “Bandwagon” + “Reductio ad Hitlerum”, and “Whataboutism” + “Straw Men” + “Red Herring”). - The technique **“Obfuscation, Intentional Vagueness, Confusion”** was **excluded** due to very low frequency (as described in the PTC documentation). The original dataset is **not redistributed** in this repository. Any modifications are the responsibility of the authors of this repository. Please cite the following paper when using the PTC corpus: Da San Martino, G., Yu, S., Barrón-Cedeño, A., Petrov, R., & Nakov, P. (2019). *Fine-Grained Analysis of Propaganda in News Articles* (EMNLP-IJCNLP 2019). ```bibtex @InProceedings{EMNLP19DaSanMartino, author = {Da San Martino, Giovanni and Yu, Seunghak and Barr\'{o}n-Cede\~no, Alberto and Petrov, Rostislav and Nakov, Preslav}, title = {Fine-Grained Analysis of Propaganda in News Articles}, booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019}, year = {2019} } ``` ## Summary note If any terms conflict, the most restrictive applicable terms for a given component apply. This repository does not grant any additional rights beyond those stated in the upstream licenses/terms. ## Limitations - Predictions can be sensitive to domain shift, language, and text length. - This model may produce false positives/negatives; use as decision support, not as sole authority. ## Contact / Notes If you use this repository, please ensure compliance with the upstream licenses/terms and provide appropriate dataset attribution. ---