Update model card with metrics and usage examples

1b8e0dd verified 5 days ago

5.11 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- propaganda-detection
	- multi-label-classification
	- modernbert
	- nci-protocol
	base_model: answerdotai/ModernBERT-base
	datasets:
	- synapti/nci-propaganda-production
	metrics:
	- f1
	- precision
	- recall
	pipeline_tag: text-classification
	---

	# NCI Technique Classifier v2

	Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.

	## Model Description

	This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:
	- Stage 1: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists
	- Stage 2: This model identifies which specific techniques are used

	### Techniques Detected

	\| ID \| Technique \| Description \|
	\|----\|-----------\|-------------\|
	\| 0 \| Loaded_Language \| Using words with strong emotional implications \|
	\| 1 \| Appeal_to_fear-prejudice \| Seeking to build support by instilling fear \|
	\| 2 \| Exaggeration,Minimisation \| Overstating or understating aspects of issues \|
	\| 3 \| Repetition \| Repeating the same message multiple times \|
	\| 4 \| Flag-Waving \| Appeals to patriotism or group identity \|
	\| 5 \| Name_Calling,Labeling \| Giving a subject a name with negative connotations \|
	\| 6 \| Reductio_ad_hitlerum \| Comparing to Hitler or Nazis to discredit \|
	\| 7 \| Black-and-White_Fallacy \| Presenting only two options when more exist \|
	\| 8 \| Causal_Oversimplification \| Assuming a single cause for complex issues \|
	\| 9 \| Whataboutism,Straw_Men,Red_Herring \| Deflection and misrepresentation tactics \|
	\| 10 \| Straw_Man \| Misrepresenting someone's argument \|
	\| 11 \| Red_Herring \| Introducing irrelevant information \|
	\| 12 \| Doubt \| Questioning credibility of sources \|
	\| 13 \| Appeal_to_Authority \| Citing authorities to support claims \|
	\| 14 \| Thought-terminating_Cliches \| Using clichés to end discussion \|
	\| 15 \| Bandwagon \| Appeal to popularity \|
	\| 16 \| Slogans \| Brief, striking phrases \|
	\| 17 \| Obfuscation,Intentional_Vagueness,Confusion \| Being deliberately unclear \|

	## Training

	- Base Model: `answerdotai/ModernBERT-base`
	- Dataset: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test)
	- Loss: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
	- Epochs: 5
	- Batch Size: 16
	- Learning Rate: 2e-5
	- Hardware: NVIDIA A10G GPU

	## Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Micro F1 \| 80.2% \|
	\| Macro F1 \| 63.9% \|
	\| Micro Precision \| 83.4% \|
	\| Micro Recall \| 77.4% \|

	### Per-Technique Performance (selected)

	\| Technique \| F1 Score \|
	\|-----------\|----------\|
	\| Loaded_Language \| 97.0% \|
	\| Appeal_to_fear-prejudice \| 89.7% \|
	\| Name_Calling,Labeling \| 84.3% \|
	\| Flag-Waving \| 82.1% \|

	## Usage

	### With Transformers

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
	tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")

	text = "The radical left is DESTROYING our great nation!"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.sigmoid(outputs.logits)[0]

	# Get techniques above threshold
	threshold = 0.5
	techniques = list(model.config.id2label.values())
	detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
	print(detected)
	```

	### With NCI Protocol

	```python
	from nci.transformers.two_stage_pipeline import TwoStagePipeline

	pipeline = TwoStagePipeline.from_pretrained(
	binary_model="synapti/nci-binary-detector-v2",
	technique_model="synapti/nci-technique-classifier-v2",
	)

	result = pipeline.analyze("The radical left is DESTROYING our great nation!")
	print(f"Has propaganda: {result.has_propaganda}")
	print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")
	```

	### ONNX Inference

	ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup).

	```python
	import onnxruntime as ort
	import numpy as np
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
	session = ort.InferenceSession("onnx/model.onnx")

	text = "WAKE UP AMERICA!"
	inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)

	outputs = session.run(None, {
	"input_ids": inputs["input_ids"],
	"attention_mask": inputs["attention_mask"]
	})
	probs = 1 / (1 + np.exp(-outputs[0])) # sigmoid
	```

	## Limitations

	- Trained primarily on English news articles
	- May not generalize well to social media or other domains
	- Threshold of 0.5 may need adjustment for specific use cases
	- Multi-label classification means multiple techniques can be detected per text

	## Citation

	```bibtex
	@misc{nci-technique-classifier-v2,
	author = {Synapti},
	title = {NCI Technique Classifier v2},
	year = {2024},
	publisher = {Hugging Face},
	url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
	}
	```

	## License

	MIT License