Josephaimodels
/

manipulation-flag-models

Text Classification

manipulation-detection

Model card Files Files and versions

manipulation-flag-models / README.md

Josephaimodels's picture

Upload README.md

7f4b4a9 verified 5 months ago

|

history blame contribute delete

1.81 kB

	---
	language:
	- en
	license: mit
	tags:
	- text-classification
	- manipulation-detection
	- pytorch
	- transformers
	library_name: transformers
	pipeline_tag: text-classification
	metrics:
	- f1
	- accuracy
	- precision
	- recall
	model-index:
	- name: manipulation-detector-xtremedistil
	results:
	- task:
	type: text-classification
	name: Manipulation Detection
	dataset:
	name: synthetic-interpersonal-data
	type: text-classification
	metrics:
	- type: f1
	value: 0.99
	- type: accuracy
	value: 0.99
	- name: manipulation-detector-deberta
	results:
	- task:
	type: text-classification
	name: Manipulation Detection
	dataset:
	name: synthetic-interpersonal-data
	type: text-classification
	metrics:
	- type: f1
	value: 0.99
	- type: accuracy
	value: 0.99
	---

	These two classifier models are fine-tuned to flag possible manipulation in messages, having been trained on synthetic interpersonal relationship data.

	The smaller model is based on microsoft/xtremedistil-l6-h256-uncased and has 12.75M total parameters. The larger uses microsoft/deberta-v3-xsmall and is at 70.83M total parameters. Both models achieve +99% F1 score on the held out test split.

	The confidence score of the predictions are scaled to reflect the probability of the prediction being true, however there are instances when the models predict a blatantly wrong answer with full confidence. Furthermore, if the message requires additional context to be manipulative, then it is considered bening.

	The training data was augmented to make the models robust to typos and adversarial attacks, but highest accuracy is achieved on clean text.

	Inference scripts are provided alongside the models for quick setup.

	Both models are released under the MIT license.