CogniBERT-fineTuned-dark-pattern / README.md

updated readme

224e6a5 verified over 1 year ago

6.06 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- dark-pattern
	- dark-pattern-classification
	- BERT
	- dark-pattern-detection
	metrics:
	- accuracy
	pipeline_tag: text-classification
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [Adarsh Maurya]
	- Model type: [Safetensors-F32]
	- License: [Other]
	- Finetuned from model: [google-bert/bert-base-uncased]

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [https://github.com/4darsh-Dev/CogniGaurd]
	- Paper [optional]: [More Information Needed]
	- Demo: [https://huggingface.co/spaces/4darsh-Dev/dark_pattern_detector_app]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	1. For Detection of Text Based Dark Patterns.
	2. It has been to classify dark patterns in 7 Categories( Urgency, Scarcity, Misdirection, Social-Proof, Obstruction, Sneaking, Forced Action) + Not Dark Pattern.
	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	### Usage
	This model can be loaded and used with the Transformers library:

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model_name = "your-username/your-model-name"
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Example usage
	text = "Only 2 items left in stock!"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	predictions = outputs.logits.argmax(-1)

	```


	## How to Get Started with the Model

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	class DarkPatternDetector:
	def __init__(self, model_name):
	self.label_dict = {
	0: "Urgency", 1: "Not Dark Pattern", 2: "Scarcity", 3: "Misdirection",
	4: "Social Proof", 5: "Obstruction", 6: "Sneaking", 7: "Forced Action"
	}
	self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
	print(f"Using device: {self.device}")

	self.model = AutoModelForSequenceClassification.from_pretrained(model_name).to(self.device)
	self.tokenizer = AutoTokenizer.from_pretrained(model_name)

	def predict(self, text):
	inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)

	with torch.no_grad():
	outputs = self.model(**inputs)
	probabilities = torch.nn.functional.softmax(outputs.logits, dim=1)
	predicted_label = torch.argmax(probabilities, dim=1).item()

	return self.label_dict[predicted_label]

	# Usage
	if __name__ == "__main__":
	# Replace with your Hugging Face model name
	model_name = "your-username/your-model-name"
	detector = DarkPatternDetector(model_name)

	# Example usage
	texts_to_predict = [
	"Only 2 items left in stock!",
	"This offer ends in 10 minutes!",
	"Join now and get 50% off!",
	"By clicking 'Accept', you agree to our terms and conditions."
	]

	for text in texts_to_predict:
	result = detector.predict(text)
	print(f"Text: '{text}'\nPredicted Dark Pattern: {result}\n")



	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[More Information Needed]

	### Training Process
	- The model was fine-tuned for 5 epochs on a dataset of 5,000 examples.
	- We used the AdamW optimizer with a learning rate of 2e-5.
	- The maximum sequence length was set to 256 tokens.
	- Training was performed using mixed precision (FP16) for efficiency.


	#### Training Hyperparameters

	- Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	[More Information Needed]

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->


	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	[More Information Needed]


	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	Metric Score
	0 Accuracy 0.811881
	1 Precision 0.808871
	2 Recall 0.811881
	3 F1-Score 0.796837

	#### Summary




	## Technical Specifications [optional]

	### Model Architecture and Objective

	[More Information Needed]

	### Compute Infrastructure

	#### Hardware
	- GPU: NVIDIA Tesla P100 (16GB VRAM)
	- Platform: Kaggle Notebooks

	#### Software
	- Python 3.10
	- PyTorch 1.13.1
	- Transformers library 4.29.2
	- CUDA 11.6


	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]

	## Glossary [optional]

	<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

	[More Information Needed]

	## More Information [optional]

	[More Information Needed]

	## Model Card Authors [optional]

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]