|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
tags: |
|
|
- text-classification |
|
|
- manipulation-detection |
|
|
- pytorch |
|
|
- transformers |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
metrics: |
|
|
- f1 |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: manipulation-detector-xtremedistil |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Manipulation Detection |
|
|
dataset: |
|
|
name: synthetic-interpersonal-data |
|
|
type: text-classification |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.99 |
|
|
- type: accuracy |
|
|
value: 0.99 |
|
|
- name: manipulation-detector-deberta |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Manipulation Detection |
|
|
dataset: |
|
|
name: synthetic-interpersonal-data |
|
|
type: text-classification |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.99 |
|
|
- type: accuracy |
|
|
value: 0.99 |
|
|
--- |
|
|
|
|
|
These two classifier models are fine-tuned to flag possible manipulation in messages, having been trained on synthetic interpersonal relationship data. |
|
|
|
|
|
The smaller model is based on microsoft/xtremedistil-l6-h256-uncased and has 12.75M total parameters. The larger uses microsoft/deberta-v3-xsmall and is at 70.83M total parameters. Both models achieve +99% F1 score on the held out test split. |
|
|
|
|
|
The confidence score of the predictions are scaled to reflect the probability of the prediction being true, however there are instances when the models predict a blatantly wrong answer with full confidence. Furthermore, if the message requires additional context to be manipulative, then it is considered bening. |
|
|
|
|
|
The training data was augmented to make the models robust to typos and adversarial attacks, but highest accuracy is achieved on clean text. |
|
|
|
|
|
Inference scripts are provided alongside the models for quick setup. |
|
|
|
|
|
Both models are released under the MIT license. |