picket-cliff
/

deepl-project

Model card Files Files and versions

deepl-project / README.md

picket-cliff's picture

Update README.md

c23f783 verified 3 months ago

|

history blame contribute delete

1.51 kB

	---
	datasets:
	- AbdulHadi806/mail_spam_ham_dataset
	language:
	- en
	metrics:
	- f1
	base_model:
	- distilbert/distilbert-base-uncased
	---
	# Deep Learning Project: Spam Detection with DistilBERT

	This repository contains the code and resources for the Deep Learning project on Spam Detection.

	## Project Structure
	- `mail_data.csv`: The dataset used for training and evaluation.
	- `eda_script.py`: Script for Exploratory Data Analysis and visualization.
	- `train_model_hf.py`: Main training script using Hugging Face Trainer and DistilBERT.
	- `evaluate_final.py`: Script for final evaluation from the best model checkpoint.
	- `eda_plots.png`: Visualizations generated during EDA.
	- `results.txt`: Detailed evaluation metrics and confusion matrix.
	- `Deep_Learning_Project_Report.pdf`: The final project report (15-17 pages equivalent).

	## Requirements
	- Python 3.11+
	- PyTorch
	- Transformers
	- Datasets
	- Scikit-learn
	- Pandas
	- Matplotlib
	- Seaborn
	- Accelerate

	## How to Run
	0. Make sure you have all requirements downloaded. In case of errors while running the code, try installing the dependencies in requirements.txt in a fresh environment.
	1. EDA: Run `python3 eda_script.py` to see the data distribution.
	2. Training: Run `python3 train_model_hf.py` to fine-tune the DistilBERT model.
	3. Evaluation: Run `python3 evaluate_final.py` to get the final performance metrics.

	## Results
	The model achieves 99.10% accuracy on the test set with an F1-score of 96.58% for the spam class.