Amenallah2001
/

intigo-technical-test

Text Classification

feature-extraction

Model card Files Files and versions

intigo-technical-test / README.md

Amenallah2001's picture

Update README.md

2c174c9 verified over 1 year ago

|

history blame contribute delete

2.79 kB

	---
	license: mit
	datasets:
	- codesignal/sms-spam-collection
	language:
	- en
	library_name: transformers
	pipeline_tag: text-classification
	---


	## Model Overview
	This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup IntiGo.

	### Model Details
	- Model Name: BERT Fine-Tuned for SMS Spam Classification
	- Library: [Transformers](https://huggingface.co/transformers/)
	- Language: English
	- Pipeline Tag: `text-classification`

	### License
	This model is released under the [MIT License](https://opensource.org/licenses/MIT).

	## Datasets
	- Training Dataset: [codesignal/sms-spam-collection](https://huggingface.co/datasets/codesignal/sms-spam-collection)

	## Fine-Tuning Procedure
	This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam).

	### Metrics
	- Precision: 0.99
	- Recall: 0.81
	- F1 Score: 0.96

	These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives.

	### Usage
	You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction.

	#### Example:
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load the model and tokenizer
	model_name = "Amenallah2001/intigo-technical-test"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example input
	text = "Congratulations! You've won a free ticket to Bahamas. Call now!"

	# Tokenize and classify
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	logits = outputs.logits
	predicted_class = logits.argmax().item()

	# Output prediction
	label_map = {0: "ham", 1: "spam"}
	print(f"Prediction: {label_map[predicted_class]}")
	```

	### Intended Use
	This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways.

	### Limitations
	- Data Imbalance: The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages.
	- Language Support: This model was fine-tuned on English text only and may not perform well on SMS messages in other languages.

	### Ethical Considerations
	When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.