CogniBERT-fineTuned-dark-pattern / README.md

update license info

68f4c15 verified over 1 year ago

7.46 kB

	---
	language:
	- en
	license: other
	library_name: transformers
	tags:
	- dark-pattern
	- dark-pattern-classification
	- BERT
	- dark-pattern-detection
	metrics:
	- accuracy
	pipeline_tag: text-classification
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: [Adarsh Maurya]
	- Model type: [Safetensors-F32]
	- License: [Other]
	- Finetuned from model: [google-bert/bert-base-uncased]

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [https://github.com/4darsh-Dev/CogniGaurd]
	- Paper [optional]: [More Information Needed]
	- Demo: [https://huggingface.co/spaces/4darsh-Dev/dark_pattern_detector_app]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	1. For Detection of Text Based Dark Patterns.
	2. It has been to classify dark patterns in 7 Categories( Urgency, Scarcity, Misdirection, Social-Proof, Obstruction, Sneaking, Forced Action) + Not Dark Pattern.
	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	### Usage
	This model can be loaded and used with the Transformers library:

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model_name = "your-username/your-model-name"
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Example usage
	text = "Only 2 items left in stock!"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	predictions = outputs.logits.argmax(-1)

	```


	## How to Get Started with the Model

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	class DarkPatternDetector:
	def __init__(self, model_name):
	self.label_dict = {
	0: "Urgency", 1: "Not Dark Pattern", 2: "Scarcity", 3: "Misdirection",
	4: "Social Proof", 5: "Obstruction", 6: "Sneaking", 7: "Forced Action"
	}
	self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
	print(f"Using device: {self.device}")

	self.model = AutoModelForSequenceClassification.from_pretrained(model_name).to(self.device)
	self.tokenizer = AutoTokenizer.from_pretrained(model_name)

	def predict(self, text):
	inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)

	with torch.no_grad():
	outputs = self.model(**inputs)
	probabilities = torch.nn.functional.softmax(outputs.logits, dim=1)
	predicted_label = torch.argmax(probabilities, dim=1).item()

	return self.label_dict[predicted_label]

	# Usage
	if __name__ == "__main__":
	# Replace with your Hugging Face model name
	model_name = "your-username/your-model-name"
	detector = DarkPatternDetector(model_name)

	# Example usage
	texts_to_predict = [
	"Only 2 items left in stock!",
	"This offer ends in 10 minutes!",
	"Join now and get 50% off!",
	"By clicking 'Accept', you agree to our terms and conditions."
	]

	for text in texts_to_predict:
	result = detector.predict(text)
	print(f"Text: '{text}'\nPredicted Dark Pattern: {result}\n")



	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[More Information Needed]

	### Training Process
	- The model was fine-tuned for 5 epochs on a dataset of 5,000 examples.
	- We used the AdamW optimizer with a learning rate of 2e-5.
	- The maximum sequence length was set to 256 tokens.
	- Training was performed using mixed precision (FP16) for efficiency.


	<!-- #### Training Hyperparameters

	- Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> -->

	<!-- #### Speeds, Sizes, Times [optional] -->

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	[More Information Needed]

	<!-- ## Evaluation -->

	<!-- This section describes the evaluation protocols and provides the results. -->


	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	[More Information Needed]


	#### Metrics

	Our model's performance is evaluated using the following metrics:

	- Accuracy: The proportion of correct predictions among the total number of cases examined.
	- Precision: The ratio of correctly predicted positive observations to the total predicted positive observations.
	- Recall: The ratio of correctly predicted positive observations to all observations in the actual class.
	- F1-Score: The harmonic mean of Precision and Recall, providing a single score that balances both metrics.

	These metrics were chosen to provide a comprehensive view of the model's performance across different aspects of classification accuracy.

	### Results

	\| Metric \| Score \|
	\|------------\|----------\|
	\| Accuracy \| 0.811881 \|
	\| Precision \| 0.808871 \|
	\| Recall \| 0.811881 \|
	\| F1-Score \| 0.796837 \|

	Our model demonstrates strong performance across all metrics:

	- An accuracy of 81.19% indicates that the model correctly classifies a high proportion of samples.
	- The precision of 80.89% shows that when the model predicts a specific dark pattern, it is correct about 81% of the time.
	- The recall of 81.19% indicates that the model successfully identifies about 81% of the actual dark patterns in the dataset.
	- An F1-Score of 79.68% represents a good balance between precision and recall.


	### Summary

	These results suggest that the model is effective at detecting and classifying dark patterns, with a good balance between identifying true positives and avoiding false positives.


	### Model Architecture and Objective


	### Compute Infrastructure

	#### Hardware
	- GPU: NVIDIA Tesla P100 (16GB VRAM)
	- Platform: Kaggle Notebooks

	#### Software
	- Python 3.10
	- PyTorch 1.13.1
	- Transformers library 4.29.2
	- CUDA 11.6


	<!-- ## Citation [optional] -->

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	<!-- BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed] -->

	## Model Card Authors

	This model card was authored by:

	- Adarsh Maurya (CS Student, Keshav Mahavidyala[UOD])


	## Model Card Contact

	For questions, comments, or feedback about this model, please contact:

	- Email: adarsh@onionreads.com
	- GitHub: [https://github.com/4darsh-Dev/CogniGaurd](https://github.com/4darsh-Dev/CogniGaurd)
	- Twitter: [@4darsh_Dev](https://twitter.com/XYZDarkPatternLab)

	For urgent inquiries, don't hesitate to get in touch with the lead researcher:
	Mr. Adarsh Maurya
	Email: adarsh230427@keshav.du.ac.in