synapti
/

nci-technique-classifier-v2

@@ -1,90 +1,165 @@
 ---
 library_name: transformers
-license: apache-2.0
-base_model: answerdotai/ModernBERT-base
 tags:
-- generated_from_trainer
-model-index:
-- name: nci-technique-classifier-v2
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# nci-technique-classifier-v2
-This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0233
-- Micro F1: 0.8017
-- Macro F1: 0.6272
-- Micro Precision: 0.8311
-- Micro Recall: 0.7743
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 32
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 5
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Micro F1 | Macro F1 | Micro Precision | Micro Recall |
-|:-------------:|:------:|:----:|:---------------:|:--------:|:--------:|:---------------:|:------------:|
-| No log        | 0.1634 | 200  | 0.0350          | 0.6311   | 0.1526   | 0.7644          | 0.5373       |
-| No log        | 0.3268 | 400  | 0.0305          | 0.6658   | 0.1814   | 0.8020          | 0.5692       |
-| 0.0552        | 0.4902 | 600  | 0.0282          | 0.7023   | 0.2044   | 0.8244          | 0.6117       |
-| 0.0552        | 0.6536 | 800  | 0.0263          | 0.7268   | 0.2181   | 0.8509          | 0.6343       |
-| 0.0273        | 0.8170 | 1000 | 0.0256          | 0.7497   | 0.2610   | 0.8305          | 0.6832       |
-| 0.0273        | 0.9804 | 1200 | 0.0249          | 0.7462   | 0.2371   | 0.8740          | 0.6510       |
-| 0.0273        | 1.1438 | 1400 | 0.0245          | 0.7626   | 0.2862   | 0.8450          | 0.6949       |
-| 0.0231        | 1.3072 | 1600 | 0.0242          | 0.7583   | 0.2371   | 0.8582          | 0.6793       |
-| 0.0231        | 1.4706 | 1800 | 0.0238          | 0.7650   | 0.3155   | 0.8457          | 0.6984       |
-| 0.0226        | 1.6340 | 2000 | 0.0238          | 0.7624   | 0.3074   | 0.8542          | 0.6885       |
-| 0.0226        | 1.7974 | 2200 | 0.0230          | 0.7626   | 0.3634   | 0.8681          | 0.68         |
-| 0.0226        | 1.9608 | 2400 | 0.0223          | 0.7747   | 0.4246   | 0.8675          | 0.6998       |
-| 0.0214        | 2.1242 | 2600 | 0.0225          | 0.7731   | 0.4412   | 0.8752          | 0.6924       |
-| 0.0214        | 2.2876 | 2800 | 0.0221          | 0.7775   | 0.4101   | 0.8733          | 0.7005       |
-| 0.0189        | 2.4510 | 3000 | 0.0219          | 0.7819   | 0.4757   | 0.8414          | 0.7303       |
-| 0.0189        | 2.6144 | 3200 | 0.0224          | 0.7796   | 0.4224   | 0.8606          | 0.7126       |
-| 0.0189        | 2.7778 | 3400 | 0.0217          | 0.7922   | 0.5512   | 0.8389          | 0.7504       |
-| 0.0187        | 2.9412 | 3600 | 0.0217          | 0.7813   | 0.4680   | 0.8610          | 0.7150       |
-| 0.0187        | 3.1046 | 3800 | 0.0224          | 0.7912   | 0.5458   | 0.8341          | 0.7526       |
-| 0.0155        | 3.2680 | 4000 | 0.0231          | 0.7922   | 0.5455   | 0.8475          | 0.7437       |
-| 0.0155        | 3.4314 | 4200 | 0.0231          | 0.7996   | 0.5843   | 0.8295          | 0.7717       |
-| 0.0155        | 3.5948 | 4400 | 0.0223          | 0.8004   | 0.5706   | 0.8398          | 0.7646       |
-| 0.0148        | 3.7582 | 4600 | 0.0228          | 0.8096   | 0.6067   | 0.8527          | 0.7706       |
-| 0.0148        | 3.9216 | 4800 | 0.0229          | 0.8135   | 0.6228   | 0.8457          | 0.7837       |
-| 0.0126        | 4.0850 | 5000 | 0.0255          | 0.8095   | 0.6251   | 0.8379          | 0.7830       |
-| 0.0126        | 4.2484 | 5200 | 0.0267          | 0.8061   | 0.6223   | 0.8325          | 0.7812       |
-| 0.0126        | 4.4118 | 5400 | 0.0261          | 0.8081   | 0.6338   | 0.8372          | 0.7809       |
-### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.1
-- Tokenizers 0.22.1

 ---
+language:
+- en
+license: mit
 library_name: transformers
 tags:
+- propaganda-detection
+- multi-label-classification
+- modernbert
+- nci-protocol
+base_model: answerdotai/ModernBERT-base
+datasets:
+- synapti/nci-propaganda-production
+metrics:
+- f1
+- precision
+- recall
+pipeline_tag: text-classification
 ---
+# NCI Technique Classifier v2
+Multi-label propaganda technique classifier for the NCI (News Content Intelligence) Protocol.
+## Model Description
+This model classifies text into 18 propaganda techniques as part of a two-stage pipeline:
+- **Stage 1**: Binary detection (`synapti/nci-binary-detector-v2`) determines if propaganda exists
+- **Stage 2**: This model identifies which specific techniques are used
+### Techniques Detected
+| ID | Technique | Description |
+|----|-----------|-------------|
+| 0 | Loaded_Language | Using words with strong emotional implications |
+| 1 | Appeal_to_fear-prejudice | Seeking to build support by instilling fear |
+| 2 | Exaggeration,Minimisation | Overstating or understating aspects of issues |
+| 3 | Repetition | Repeating the same message multiple times |
+| 4 | Flag-Waving | Appeals to patriotism or group identity |
+| 5 | Name_Calling,Labeling | Giving a subject a name with negative connotations |
+| 6 | Reductio_ad_hitlerum | Comparing to Hitler or Nazis to discredit |
+| 7 | Black-and-White_Fallacy | Presenting only two options when more exist |
+| 8 | Causal_Oversimplification | Assuming a single cause for complex issues |
+| 9 | Whataboutism,Straw_Men,Red_Herring | Deflection and misrepresentation tactics |
+| 10 | Straw_Man | Misrepresenting someone's argument |
+| 11 | Red_Herring | Introducing irrelevant information |
+| 12 | Doubt | Questioning credibility of sources |
+| 13 | Appeal_to_Authority | Citing authorities to support claims |
+| 14 | Thought-terminating_Cliches | Using clichés to end discussion |
+| 15 | Bandwagon | Appeal to popularity |
+| 16 | Slogans | Brief, striking phrases |
+| 17 | Obfuscation,Intentional_Vagueness,Confusion | Being deliberately unclear |
+## Training
+- **Base Model**: `answerdotai/ModernBERT-base`
+- **Dataset**: `synapti/nci-propaganda-production` (19,581 train, 1,727 val, 1,729 test)
+- **Loss**: Focal Loss (gamma=2.0) with class weights for imbalanced techniques
+- **Epochs**: 5
+- **Batch Size**: 16
+- **Learning Rate**: 2e-5
+- **Hardware**: NVIDIA A10G GPU
+## Performance
+| Metric | Score |
+|--------|-------|
+| Micro F1 | 80.2% |
+| Macro F1 | 63.9% |
+| Micro Precision | 83.4% |
+| Micro Recall | 77.4% |
+### Per-Technique Performance (selected)
+| Technique | F1 Score |
+|-----------|----------|
+| Loaded_Language | 97.0% |
+| Appeal_to_fear-prejudice | 89.7% |
+| Name_Calling,Labeling | 84.3% |
+| Flag-Waving | 82.1% |
+## Usage
+### With Transformers
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-technique-classifier-v2")
+tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
+text = "The radical left is DESTROYING our great nation!"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.sigmoid(outputs.logits)[0]
+# Get techniques above threshold
+threshold = 0.5
+techniques = list(model.config.id2label.values())
+detected = [(techniques[i], probs[i].item()) for i in range(len(techniques)) if probs[i] > threshold]
+print(detected)
+```
+### With NCI Protocol
+```python
+from nci.transformers.two_stage_pipeline import TwoStagePipeline
+pipeline = TwoStagePipeline.from_pretrained(
+    binary_model="synapti/nci-binary-detector-v2",
+    technique_model="synapti/nci-technique-classifier-v2",
+)
+result = pipeline.analyze("The radical left is DESTROYING our great nation!")
+print(f"Has propaganda: {result.has_propaganda}")
+print(f"Techniques: {[t.name for t in result.techniques if t.above_threshold]}")
+```
+### ONNX Inference
+ONNX model available in `onnx/model.onnx` for faster inference (~1.25x speedup).
+```python
+import onnxruntime as ort
+import numpy as np
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("synapti/nci-technique-classifier-v2")
+session = ort.InferenceSession("onnx/model.onnx")
+text = "WAKE UP AMERICA!"
+inputs = tokenizer(text, return_tensors="np", truncation=True, max_length=512)
+outputs = session.run(None, {
+    "input_ids": inputs["input_ids"],
+    "attention_mask": inputs["attention_mask"]
+})
+probs = 1 / (1 + np.exp(-outputs[0]))  # sigmoid
+```
+## Limitations
+- Trained primarily on English news articles
+- May not generalize well to social media or other domains
+- Threshold of 0.5 may need adjustment for specific use cases
+- Multi-label classification means multiple techniques can be detected per text
+## Citation
+```bibtex
+@misc{nci-technique-classifier-v2,
+  author = {Synapti},
+  title = {NCI Technique Classifier v2},
+  year = {2024},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/synapti/nci-technique-classifier-v2}
+}
+```
+## License
+MIT License