Spaces:

inderjeet
/

NetworkSecurity

Runtime error

App Files Files Community

Inder-26 commited on Jan 2

Commit

195cc50

1 Parent(s): 8eae904

feat: Initialize project structure with ML components, utilities, exception handling, and S3 integration.

Browse files

Files changed (20) hide show

Dockerfile +3 -2
README.md +241 -1
app.py +10 -4
deploy_stage +1 -0
images/architecture_diagram.png +3 -0
images/confusion_matrix.png +3 -0
images/dagshub_experiments.png +3 -0
images/data_ingestion_diagram.png +3 -0
images/data_transformation_diagram.png +3 -0
images/data_validation_diagram.png +3 -0
images/model_training_diagram.png +3 -0
images/precision_recall_curve.png +3 -0
images/prediction_results.png +3 -0
images/roc_curve.png +3 -0
images/ui_homepage.png +3 -0
networksecurity/components/data_ingestion.py +2 -2
networksecurity/components/model_trainer.py +8 -1
networksecurity/constant/training_pipeline/__init__.py +1 -1
networksecurity/pipeline/training_pipeline.py +2 -2
requirements.txt +25 -118

Dockerfile CHANGED Viewed

@@ -10,8 +10,9 @@ RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
 # Copy requirements file first to leverage Docker cache
 COPY requirements.txt .
-# Install Python dependencies
-RUN pip install --no-cache-dir -r requirements.txt
 # Copy the rest of the application code
 COPY . .

 # Copy requirements file first to leverage Docker cache
 COPY requirements.txt .
+# Upgrade pip and install Python dependencies
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
 # Copy the rest of the application code
 COPY . .

README.md CHANGED Viewed

	@@ -1 +1,241 @@
1	- ~~# NetworkSecurity~~

+---
+title: Network Security - Phishing Detection
+emoji: 🛡️
+colorFrom: blue
+colorTo: purple
+sdk: docker
+pinned: true
+license: mit
+app_port: 7860
+---
+# 🛡️ Network Security System: Phishing URL Detection
+![UI Homepage](images/ui_homepage.png)
+## 📋 Table of Contents
+- [About The Project](#-about-the-project)
+- [Architecture](#-architecture)
+- [Features](#-features)
+- [Tech Stack](#-tech-stack)
+- [Dataset](#-dataset)
+- [Project Structure](#-project-structure)
+- [Pipeline Workflow](#-pipeline-workflow)
+- [Screenshots](#-screenshots)
+- [Installation](#-installation)
+- [Usage](#-usage)
+- [Model Performance](#-model-performance)
+- [Experiment Tracking](#-experiment-tracking)
+- [Future Enhancements](#-future-enhancements)
+- [Contributing](#-contributing)
+- [License](#-license)
+- [Contact](#-contact)
+## 🚀 Live Demo
+- **Live Application**: [inderjeet-networksecurity.hf.space](https://inderjeet-networksecurity.hf.space/)
+- **Experiment Tracking**: [DagsHub Experiments](https://dagshub.com/Inder-26/NetworkSecurity/experiments#/)
+## 🎯 About The Project
+In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust **Network Security Machine Learning Pipeline** designed to detect phishing URLs with high accuracy.
+It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation.
+## 🏗️ Architecture
+The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline.
+![Architecture Diagram](images/architecture_diagram.png)
+## ✨ Features
+- **🚀 End-to-End Pipeline**: Fully automated workflow from data ingestion to model deployment.
+- **🛡️ Data Validation**: Comprehensive schema checks and data drift detection using KS tests.
+- **🔄 Robust Preprocessing**: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler).
+- **🤖 Multi-Model Training**: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV.
+- **📊 Experiment Tracking**: Integrated with **MLflow** and **DagsHub** for tracking parameters, metrics, and models.
+- **⚡ Fast API**: High-performance REST API built with **FastAPI** for real-time predictions.
+- **🐳 Containerized**: Docker support for consistent deployment across environments.
+- **☁️ Cloud Ready**: Designed to be deployed on platforms like AWS or Hugging Face Spaces.
+## 🛠️ Tech Stack
+- **Languages**: Python 3.8+
+- **Frameworks**: FastAPI, Uvicorn
+- **ML Libraries**: Scikit-learn, Pandas, NumPy
+- **MLOps**: MLflow, DagsHub
+- **Database**: MongoDB
+- **Containerization**: Docker
+- **Frontend**: HTML, CSS (Custom Design System), JavaScript
+## 📊 Dataset
+The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs.
+- **Source**: [Phishing Dataset for Machine Learning](https://archive.ics.uci.edu/ml/datasets/Phishing+Websites) (or similar Phishing URL dataset)
+- **Features**: IP Address, URL Length, TinyURL, forwarding, etc.
+- **Target**: `Result` (LEGITIMATE / PHISHING)
+## 📁 Project Structure
+```
+NetworkSecurity/
+├── images/                  # Project diagrams and screenshots
+├── networksecurity/         # Main package
+│   ├── components/          # Pipeline components (Ingestion, Validation, Transformation, Training)
+│   ├── pipeline/            # Training and Prediction pipelines
+│   ├── entity/              # Artifact and Config entities
+│   ├── constants/           # Project constants
+│   ├── utils/               # Utility functions
+│   └── exception/           # Custom exception handling
+├── data_schema/             # Schema definitions
+├── Dockerfile               # Docker configuration
+├── app.py                   # FastAPI application entry point
+├── requirements.txt         # Project dependencies
+└── README.md                # Project documentation
+```
+## ⚙️ Pipeline Workflow
+### 1. Data Ingestion 📥
+Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split.
+![Data Ingestion](images/data_ingestion_diagram.png)
+### 2. Data Validation ✅
+Validates data against schema and checks for data drift.
+![Data Validation](images/data_validation_diagram.png)
+### 3. Data Transformation 🔄
+Imputes missing values and scales features for optimal model performance.
+![Data Transformation](images/data_transformation_diagram.png)
+### 4. Model Training 🤖
+Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy.
+![Model Training](images/model_training_diagram.png)
+## 📸 Screenshots
+### Prediction Results & Threat Assessment
+![Prediction Results](images/prediction_results.png)
+### Experiment Tracking (DagsHub/MLflow)
+![Experiment Tracking](images/dagshub_experiments.png)
+## 💻 Installation
+### Prerequisites
+- Python 3.8+
+- MongoDB Account
+- DagsHub Account (for experiment tracking)
+### Step-by-Step
+1. **Clone the Repository**
+   ```bash
+   git clone https://github.com/Inder-26/NetworkSecurity.git
+   cd NetworkSecurity
+   ```
+2. **Create Virtual Environment**
+   ```bash
+   python -m venv .venv
+   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+   ```
+3. **Install Dependencies**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Set Environment Variables**
+   Create a `.env` file with your credentials:
+   ```env
+   MONGO_DB_URL=mongodb+srv://<username>:<password>@cluster0.mongodb.net/?retryWrites=true&w=majority
+   MLFLOW_TRACKING_URI=https://dagshub.com/<username>/NetworkSecurity.mlflow
+   MLFLOW_TRACKING_USERNAME=<username>
+   MLFLOW_TRACKING_PASSWORD=<password>
+   ```
+## 🚀 Usage
+### Run the Web Application
+```bash
+python app.py
+```
+Visit `http://localhost:8000` to access the UI.
+### Train a New Model
+To trigger the training pipeline:
+```bash
+http://localhost:8000/train
+```
+Or use the "Train New Model" button in the UI.
+## 📈 Model Performance
+The system evaluates models using accuracy and F1 score.
+- **Best Model**: [Automatically selected, typically RandomForest or GradientBoosting]
+- **Recall**: Optimized to minimize false negatives (missing a phishing URL is dangerous).
+### Model Evaluation Metrics
+Below are the performance visualizations for the best trained model:
+#### Confusion Matrix
+![Confusion Matrix](images/confusion_matrix.png)
+#### ROC Curve
+![ROC Curve](images/roc_curve.png)
+#### Precision-Recall Curve
+![Precision-Recall Curve](images/precision_recall_curve.png)
+## 🧪 Experiment Tracking
+All runs are logged to DagsHub. You can view parameters, metrics, and models in the MLflow UI.
+## 🚀 Future Enhancements
+- [ ] Implement Deep Learning models (LSTM/CNN) for URL text analysis.
+- [ ] Add real-time browser extension.
+- [ ] Deploy serverless architecture.
+- [ ] Add more comprehensive unit and integration tests.
+## 🤝 Contributing
+Contributions are welcome! Please fork the repository and create a pull request.
+1. Fork the Project
+2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
+3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
+4. Push to the Branch (`git push origin feature/AmazingFeature`)
+5. Open a Pull Request
+## 📄 License
+Distributed under the MIT License. See `LICENSE` for more information.
+## 📞 Contact
+Inder - [GitHub Profile](https://github.com/Inder-26)

app.py CHANGED Viewed

@@ -37,10 +37,13 @@ MODEL_PATH = "final_model/model.pkl"
 PREPROCESSOR_PATH = "final_model/preprocessor.pkl"
 # Initialize DagsHub
-try:
-    dagshub.init(repo_owner="Inder-26", repo_name="NetworkSecurity", mlflow=True)
-except Exception as e:
-    print(f"⚠️ Error initializing DagsHub: {e}")
 # Feature Columns (30 features)
 FEATURE_COLUMNS = [
@@ -562,6 +565,9 @@ async def train_model():
             "message": "Training completed successfully"
         }
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))

 PREPROCESSOR_PATH = "final_model/preprocessor.pkl"
 # Initialize DagsHub
+if os.getenv("MLFLOW_TRACKING_USERNAME") and os.getenv("MLFLOW_TRACKING_PASSWORD"):
+    try:
+        dagshub.init(repo_owner="Inder-26", repo_name="NetworkSecurity", mlflow=True)
+    except Exception as e:
+        print(f"⚠️ Error initializing DagsHub: {e}")
+else:
+    print("⚠️ DagsHub credentials not found. Skipping initialization.")
 # Feature Columns (30 features)
 FEATURE_COLUMNS = [
             "message": "Training completed successfully"
         }
     except Exception as e:
+        import traceback
+        traceback.print_exc()
+        print(f"Training Error: {e}", file=sys.stderr)
         raise HTTPException(status_code=500, detail=str(e))

deploy_stage ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit 8a501511f66fe3a9a55300d9484937b7ce289545

images/architecture_diagram.png ADDED Viewed

Git LFS Details

SHA256: ccf348b37a5a231e3fac74d1de58dc2830105b11c66778953856a4982ddf1448
Pointer size: 131 Bytes
Size of remote file: 700 kB

images/confusion_matrix.png ADDED Viewed

Git LFS Details

SHA256: 30a3451a33d9ca9621885b06e1733a12656db790c1c574312dcc4c3e9b96aa2b
Pointer size: 130 Bytes
Size of remote file: 13.8 kB

images/dagshub_experiments.png ADDED Viewed

Git LFS Details

SHA256: e5cd617f988c0d760559e103ba5bace5eef00c52316ac34c56b7a60de10697c1
Pointer size: 131 Bytes
Size of remote file: 174 kB

images/data_ingestion_diagram.png ADDED Viewed

Git LFS Details

SHA256: d904ee34a6c3fddf63578de1f079a2e930d0cdaa6d82157e2881f27d63936b0d
Pointer size: 131 Bytes
Size of remote file: 655 kB

images/data_transformation_diagram.png ADDED Viewed

Git LFS Details

SHA256: b48b4ed3fa4be3f6312b22ca60da6177014403aa116311956e0d7414274cfe85
Pointer size: 131 Bytes
Size of remote file: 715 kB

images/data_validation_diagram.png ADDED Viewed

Git LFS Details

SHA256: faa0f74a504bfaf6405ad259ff228892adc4b8787b950233eb116c107c8b68e9
Pointer size: 131 Bytes
Size of remote file: 632 kB

images/model_training_diagram.png ADDED Viewed

Git LFS Details

SHA256: 92cf76421bab9f624c08fb20f19be13c1ce222504aaea02af0e479547ad19083
Pointer size: 131 Bytes
Size of remote file: 721 kB

images/precision_recall_curve.png ADDED Viewed

Git LFS Details

SHA256: 6a5a4b817f42dc07681f814a116e4b60279da002d546be4aa9942d9c5c47ae03
Pointer size: 130 Bytes
Size of remote file: 15.6 kB

images/prediction_results.png ADDED Viewed

Git LFS Details

SHA256: b0b2ffa72686237473674229e176fe88f3c08aa711dcfbb232d50c8c22c0f183
Pointer size: 131 Bytes
Size of remote file: 104 kB

images/roc_curve.png ADDED Viewed

Git LFS Details

SHA256: f73170d0a1c6479f72d1999d5681758610440e0d94ff34eb320ef9c6214f20cc
Pointer size: 130 Bytes
Size of remote file: 22.9 kB

images/ui_homepage.png ADDED Viewed

Git LFS Details

SHA256: f38b69ba49c40aa0ea249727451e4ba2fb275dc2d9a6ebf55c8f5590ed9e0513
Pointer size: 131 Bytes
Size of remote file: 103 kB

networksecurity/components/data_ingestion.py CHANGED Viewed

@@ -46,11 +46,11 @@ class DataIngestion:
             import logging
             logging.info(f"MongoDB unavailable, using sample CSV: {str(e)}")
             try:
-                df = pd.read_csv("valid_data/test.csv")
                 logging.info(f" Loaded {len(df)} rows from CSV")
                 return df
             except FileNotFoundError:
-                raise NetworkSecurityException("Sample CSV not found at valid_data/test.csv", sys)
     def export_data_into_feature_store(self, dataframe: pd.DataFrame):

             import logging
             logging.info(f"MongoDB unavailable, using sample CSV: {str(e)}")
             try:
+                df = pd.read_csv("Network_data/phisingData.csv")
                 logging.info(f" Loaded {len(df)} rows from CSV")
                 return df
             except FileNotFoundError:
+                raise NetworkSecurityException("Sample CSV not found at Network_data/phisingData.csv", sys)
     def export_data_into_feature_store(self, dataframe: pd.DataFrame):

networksecurity/components/model_trainer.py CHANGED Viewed

@@ -41,8 +41,15 @@ from sklearn.metrics import (
     precision_recall_curve,
 )
 # ---------------- Dagshub + MLflow ----------------
-dagshub.init(repo_owner="Inder-26",repo_name="NetworkSecurity",mlflow=True)

     precision_recall_curve,
 )
+import os
 # ---------------- Dagshub + MLflow ----------------
+if os.getenv("MLFLOW_TRACKING_URI"):
+    print("info: MLflow tracking URI is already set, skipping DagsHub init")
+elif os.getenv("MLFLOW_TRACKING_USERNAME") and os.getenv("MLFLOW_TRACKING_PASSWORD"):
+    dagshub.init(repo_owner="Inder-26", repo_name="NetworkSecurity", mlflow=True)
+else:
+    print("Warning: DagsHub credentials not found. Tracking might rely on local configs or fail.")

networksecurity/constant/training_pipeline/__init__.py CHANGED Viewed

@@ -14,7 +14,7 @@ FILE_NAME: str = "phisingkData.csv"
 TRAIN_FILE_NAME: str = "train.csv"
 TEST_FILE_NAME: str = "test.csv"
-SCHEMA_FILE_PATH = os.path.join("data_schema", "schema.yaml")
 SAVED_MODEL_DIR_NAME = os.path.join("saved_models")
 MODEL_FILE_NAME: str = "model.pkl"

 TRAIN_FILE_NAME: str = "train.csv"
 TEST_FILE_NAME: str = "test.csv"
+SCHEMA_FILE_PATH = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(__file__)))), "data_schema", "schema.yaml")
 SAVED_MODEL_DIR_NAME = os.path.join("saved_models")
 MODEL_FILE_NAME: str = "model.pkl"

networksecurity/pipeline/training_pipeline.py CHANGED Viewed

@@ -135,8 +135,8 @@ class TrainingPipeline:
             model_trainer_artifact = self.start_model_trainer(data_transformation_artifact=data_transformation_artifact)
             logging.info("Training pipeline completed successfully")
-            self.sync_artifact_dir_to_s3()
-            self.sync_saved_model_dir_to_s3()
             return model_trainer_artifact
         except Exception as e:

             model_trainer_artifact = self.start_model_trainer(data_transformation_artifact=data_transformation_artifact)
             logging.info("Training pipeline completed successfully")
+            # self.sync_artifact_dir_to_s3()
+            # self.sync_saved_model_dir_to_s3()
             return model_trainer_artifact
         except Exception as e:

requirements.txt CHANGED Viewed

@@ -1,118 +1,25 @@
-alembic==1.17.2
-annotated-doc==0.0.4
-annotated-types==0.7.0
-anyio==4.12.0
-appdirs==1.4.4
-backoff==2.2.1
-blinker==1.9.0
-boto3==1.42.18
-botocore==1.42.18
-cachetools==6.2.4
-certifi==2025.11.12
-cffi==2.0.0
-charset-normalizer==3.4.4
-click==8.3.1
-cloudpickle==3.1.2
-colorama==0.4.6
-contourpy==1.3.3
-cryptography==46.0.3
-cycler==0.12.1
-dacite==1.6.0
-dagshub==0.6.4
-dagshub-annotation-converter==0.1.15
-databricks-sdk==0.76.0
-dataclasses-json==0.6.7
-dill==0.4.0
-dnspython==1.16.0
-docker==7.1.0
-fastapi==0.128.0
-flask==3.1.2
-flask-cors==6.0.2
-fonttools==4.61.1
-gitdb==4.0.12
-gitpython==3.1.45
-google-auth==2.45.0
-gql==4.0.0
-graphene==3.4.3
-graphql-core==3.2.7
-graphql-relay==3.2.0
-greenlet==3.3.0
-h11==0.16.0
-httpcore==1.0.9
-httpx==0.28.1
-huey==2.5.5
-idna==3.11
-importlib-metadata==8.7.1
-itsdangerous==2.2.0
-jinja2==3.1.6
-jmespath==1.0.1
-joblib==1.5.3
-kiwisolver==1.4.9
-lxml==6.0.2
-mako==1.3.10
-markdown-it-py==4.0.0
-markupsafe==3.0.3
-marshmallow==3.26.2
-matplotlib==3.10.8
-mdurl==0.1.2
-mlflow==3.8.1
-mlflow-skinny==3.8.1
-mlflow-tracing==3.8.1
-multidict==6.7.0
-mypy-extensions==1.1.0
-# -e file:///D:/Coding%20Central/NetworkSecurity
-numpy==2.4.0
-opentelemetry-api==1.39.1
-opentelemetry-proto==1.39.1
-opentelemetry-sdk==1.39.1
-opentelemetry-semantic-conventions==0.60b1
-packaging==25.0
-pandas==2.3.3
-pathvalidate==3.3.1
-pillow==12.0.0
-propcache==0.4.1
-protobuf==6.33.2
-pyaml==25.7.0
-pyarrow==22.0.0
-pyasn1==0.6.1
-pyasn1-modules==0.4.2
-pycparser==2.23
-pydantic==2.12.5
-pydantic-core==2.41.5
-pygments==2.19.2
-pymongo==3.11.0
-pyparsing==3.3.1
-python-dateutil==2.9.0.post0
-python-dotenv==1.2.1
-python-multipart==0.0.21
-pytz==2025.2
-pyyaml==6.0.3
-requests==2.32.5
-requests-toolbelt==1.0.0
-rich==14.2.0
-rsa==4.9.1
-s3transfer==0.16.0
-scikit-learn==1.8.0
-scipy==1.16.3
-seaborn==0.13.2
-semver==3.0.4
-setuptools==80.9.0
-six==1.17.0
-smmap==5.0.2
-sqlalchemy==2.0.45
-sqlparse==0.5.5
-starlette==0.50.0
-tenacity==9.1.2
-threadpoolctl==3.6.0
-treelib==1.8.0
-typing-extensions==4.15.0
-typing-inspect==0.9.0
-typing-inspection==0.4.2
-tzdata==2025.3
-urllib3==2.6.2
-uvicorn==0.40.0
-waitress==3.0.2
-werkzeug==3.1.4
-yarl==1.22.0
-zipp==3.23.0
-aiofiles==23.2.1

+numpy
+pandas
+scikit-learn
+matplotlib
+seaborn
+fastapi
+uvicorn
+pymongo
+python-dotenv
+mlflow
+dagshub
+requests
+boto3
+botocore
+dill
+joblib
+flask
+flask-cors
+jinja2
+markupsafe
+itsdangerous
+werkzeug
+click
+pytest
+python-multipart