Update README.md

c23f783 verified 3 months ago

1.51 kB

datasets:
  - AbdulHadi806/mail_spam_ham_dataset
language:
  - en
metrics:
  - f1
base_model:
  - distilbert/distilbert-base-uncased

Deep Learning Project: Spam Detection with DistilBERT

This repository contains the code and resources for the Deep Learning project on Spam Detection.

Project Structure

mail_data.csv: The dataset used for training and evaluation.
eda_script.py: Script for Exploratory Data Analysis and visualization.
train_model_hf.py: Main training script using Hugging Face Trainer and DistilBERT.
evaluate_final.py: Script for final evaluation from the best model checkpoint.
eda_plots.png: Visualizations generated during EDA.
results.txt: Detailed evaluation metrics and confusion matrix.
Deep_Learning_Project_Report.pdf: The final project report (15-17 pages equivalent).

Make sure you have all requirements downloaded. In case of errors while running the code, try installing the dependencies in requirements.txt in a fresh environment.
EDA: Run python3 eda_script.py to see the data distribution.
Training: Run python3 train_model_hf.py to fine-tune the DistilBERT model.
Evaluation: Run python3 evaluate_final.py to get the final performance metrics.

The model achieves 99.10% accuracy on the test set with an F1-score of 96.58% for the spam class.