Spaces:
Runtime error
Runtime error
Inder-26 commited on
Commit ·
195cc50
1
Parent(s): 8eae904
feat: Initialize project structure with ML components, utilities, exception handling, and S3 integration.
Browse files- Dockerfile +3 -2
- README.md +241 -1
- app.py +10 -4
- deploy_stage +1 -0
- images/architecture_diagram.png +3 -0
- images/confusion_matrix.png +3 -0
- images/dagshub_experiments.png +3 -0
- images/data_ingestion_diagram.png +3 -0
- images/data_transformation_diagram.png +3 -0
- images/data_validation_diagram.png +3 -0
- images/model_training_diagram.png +3 -0
- images/precision_recall_curve.png +3 -0
- images/prediction_results.png +3 -0
- images/roc_curve.png +3 -0
- images/ui_homepage.png +3 -0
- networksecurity/components/data_ingestion.py +2 -2
- networksecurity/components/model_trainer.py +8 -1
- networksecurity/constant/training_pipeline/__init__.py +1 -1
- networksecurity/pipeline/training_pipeline.py +2 -2
- requirements.txt +25 -118
Dockerfile
CHANGED
|
@@ -10,8 +10,9 @@ RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
|
|
| 10 |
# Copy requirements file first to leverage Docker cache
|
| 11 |
COPY requirements.txt .
|
| 12 |
|
| 13 |
-
#
|
| 14 |
-
RUN pip install --no-cache-dir -
|
|
|
|
| 15 |
|
| 16 |
# Copy the rest of the application code
|
| 17 |
COPY . .
|
|
|
|
| 10 |
# Copy requirements file first to leverage Docker cache
|
| 11 |
COPY requirements.txt .
|
| 12 |
|
| 13 |
+
# Upgrade pip and install Python dependencies
|
| 14 |
+
RUN pip install --no-cache-dir --upgrade pip && \
|
| 15 |
+
pip install --no-cache-dir -r requirements.txt
|
| 16 |
|
| 17 |
# Copy the rest of the application code
|
| 18 |
COPY . .
|
README.md
CHANGED
|
@@ -1 +1,241 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Network Security - Phishing Detection
|
| 3 |
+
emoji: 🛡️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: true
|
| 8 |
+
license: mit
|
| 9 |
+
app_port: 7860
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# 🛡️ Network Security System: Phishing URL Detection
|
| 13 |
+
|
| 14 |
+

|
| 15 |
+
|
| 16 |
+
## 📋 Table of Contents
|
| 17 |
+
|
| 18 |
+
- [About The Project](#-about-the-project)
|
| 19 |
+
- [Architecture](#-architecture)
|
| 20 |
+
- [Features](#-features)
|
| 21 |
+
- [Tech Stack](#-tech-stack)
|
| 22 |
+
- [Dataset](#-dataset)
|
| 23 |
+
- [Project Structure](#-project-structure)
|
| 24 |
+
- [Pipeline Workflow](#-pipeline-workflow)
|
| 25 |
+
- [Screenshots](#-screenshots)
|
| 26 |
+
- [Installation](#-installation)
|
| 27 |
+
- [Usage](#-usage)
|
| 28 |
+
- [Model Performance](#-model-performance)
|
| 29 |
+
- [Experiment Tracking](#-experiment-tracking)
|
| 30 |
+
- [Future Enhancements](#-future-enhancements)
|
| 31 |
+
- [Contributing](#-contributing)
|
| 32 |
+
- [License](#-license)
|
| 33 |
+
- [Contact](#-contact)
|
| 34 |
+
|
| 35 |
+
## 🚀 Live Demo
|
| 36 |
+
|
| 37 |
+
- **Live Application**: [inderjeet-networksecurity.hf.space](https://inderjeet-networksecurity.hf.space/)
|
| 38 |
+
- **Experiment Tracking**: [DagsHub Experiments](https://dagshub.com/Inder-26/NetworkSecurity/experiments#/)
|
| 39 |
+
|
| 40 |
+
## 🎯 About The Project
|
| 41 |
+
|
| 42 |
+
In the digital age, cybersecurity threats such as phishing attacks are becoming increasingly sophisticated. This project implements a robust **Network Security Machine Learning Pipeline** designed to detect phishing URLs with high accuracy.
|
| 43 |
+
|
| 44 |
+
It leverages a modular MLOps architecture, ensuring scalability, maintainability, and reproducibility. The system automates the entire flow from data ingestion to model deployment, utilizing advanced techniques like drift detection and automated model evaluation.
|
| 45 |
+
|
| 46 |
+
## 🏗️ Architecture
|
| 47 |
+
|
| 48 |
+
The system follows a strict modular pipeline architecture, orchestrated by a central training pipeline.
|
| 49 |
+
|
| 50 |
+

|
| 51 |
+
|
| 52 |
+
## ✨ Features
|
| 53 |
+
|
| 54 |
+
- **🚀 End-to-End Pipeline**: Fully automated workflow from data ingestion to model deployment.
|
| 55 |
+
- **🛡️ Data Validation**: Comprehensive schema checks and data drift detection using KS tests.
|
| 56 |
+
- **🔄 Robust Preprocessing**: Automated handling of missing values (KNN Imputer) and feature scaling (Robust Scaler).
|
| 57 |
+
- **🤖 Multi-Model Training**: Experiments with RandomForest, DecisionTree, GradientBoosting, and AdaBoost using GridSearchCV.
|
| 58 |
+
- **📊 Experiment Tracking**: Integrated with **MLflow** and **DagsHub** for tracking parameters, metrics, and models.
|
| 59 |
+
- **⚡ Fast API**: High-performance REST API built with **FastAPI** for real-time predictions.
|
| 60 |
+
- **🐳 Containerized**: Docker support for consistent deployment across environments.
|
| 61 |
+
- **☁️ Cloud Ready**: Designed to be deployed on platforms like AWS or Hugging Face Spaces.
|
| 62 |
+
|
| 63 |
+
## 🛠️ Tech Stack
|
| 64 |
+
|
| 65 |
+
- **Languages**: Python 3.8+
|
| 66 |
+
- **Frameworks**: FastAPI, Uvicorn
|
| 67 |
+
- **ML Libraries**: Scikit-learn, Pandas, NumPy
|
| 68 |
+
- **MLOps**: MLflow, DagsHub
|
| 69 |
+
- **Database**: MongoDB
|
| 70 |
+
- **Containerization**: Docker
|
| 71 |
+
- **Frontend**: HTML, CSS (Custom Design System), JavaScript
|
| 72 |
+
|
| 73 |
+
## 📊 Dataset
|
| 74 |
+
|
| 75 |
+
The project utilizes a dataset containing various URL features to distinguish between legitimate and phishing URLs.
|
| 76 |
+
|
| 77 |
+
- **Source**: [Phishing Dataset for Machine Learning](https://archive.ics.uci.edu/ml/datasets/Phishing+Websites) (or similar Phishing URL dataset)
|
| 78 |
+
- **Features**: IP Address, URL Length, TinyURL, forwarding, etc.
|
| 79 |
+
- **Target**: `Result` (LEGITIMATE / PHISHING)
|
| 80 |
+
|
| 81 |
+
## 📁 Project Structure
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
NetworkSecurity/
|
| 85 |
+
├── images/ # Project diagrams and screenshots
|
| 86 |
+
├── networksecurity/ # Main package
|
| 87 |
+
│ ├── components/ # Pipeline components (Ingestion, Validation, Transformation, Training)
|
| 88 |
+
│ ├── pipeline/ # Training and Prediction pipelines
|
| 89 |
+
│ ├── entity/ # Artifact and Config entities
|
| 90 |
+
│ ├── constants/ # Project constants
|
| 91 |
+
│ ├── utils/ # Utility functions
|
| 92 |
+
│ └── exception/ # Custom exception handling
|
| 93 |
+
├── data_schema/ # Schema definitions
|
| 94 |
+
├── Dockerfile # Docker configuration
|
| 95 |
+
├── app.py # FastAPI application entry point
|
| 96 |
+
├── requirements.txt # Project dependencies
|
| 97 |
+
└── README.md # Project documentation
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
## ⚙️ Pipeline Workflow
|
| 101 |
+
|
| 102 |
+
### 1. Data Ingestion 📥
|
| 103 |
+
|
| 104 |
+
Fetches data from MongoDB, handles fallback to local CSV, and performs train-test split.
|
| 105 |
+

|
| 106 |
+
|
| 107 |
+
### 2. Data Validation ✅
|
| 108 |
+
|
| 109 |
+
Validates data against schema and checks for data drift.
|
| 110 |
+

|
| 111 |
+
|
| 112 |
+
### 3. Data Transformation 🔄
|
| 113 |
+
|
| 114 |
+
Imputes missing values and scales features for optimal model performance.
|
| 115 |
+

|
| 116 |
+
|
| 117 |
+
### 4. Model Training 🤖
|
| 118 |
+
|
| 119 |
+
Trains and tunes multiple models, selecting the best one based on F1-score/Accuracy.
|
| 120 |
+

|
| 121 |
+
|
| 122 |
+
## 📸 Screenshots
|
| 123 |
+
|
| 124 |
+
### Prediction Results & Threat Assessment
|
| 125 |
+
|
| 126 |
+

|
| 127 |
+
|
| 128 |
+
### Experiment Tracking (DagsHub/MLflow)
|
| 129 |
+
|
| 130 |
+

|
| 131 |
+
|
| 132 |
+
## 💻 Installation
|
| 133 |
+
|
| 134 |
+
### Prerequisites
|
| 135 |
+
|
| 136 |
+
- Python 3.8+
|
| 137 |
+
- MongoDB Account
|
| 138 |
+
- DagsHub Account (for experiment tracking)
|
| 139 |
+
|
| 140 |
+
### Step-by-Step
|
| 141 |
+
|
| 142 |
+
1. **Clone the Repository**
|
| 143 |
+
|
| 144 |
+
```bash
|
| 145 |
+
git clone https://github.com/Inder-26/NetworkSecurity.git
|
| 146 |
+
cd NetworkSecurity
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
2. **Create Virtual Environment**
|
| 150 |
+
|
| 151 |
+
```bash
|
| 152 |
+
python -m venv .venv
|
| 153 |
+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
3. **Install Dependencies**
|
| 157 |
+
|
| 158 |
+
```bash
|
| 159 |
+
pip install -r requirements.txt
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
4. **Set Environment Variables**
|
| 163 |
+
Create a `.env` file with your credentials:
|
| 164 |
+
```env
|
| 165 |
+
MONGO_DB_URL=mongodb+srv://<username>:<password>@cluster0.mongodb.net/?retryWrites=true&w=majority
|
| 166 |
+
MLFLOW_TRACKING_URI=https://dagshub.com/<username>/NetworkSecurity.mlflow
|
| 167 |
+
MLFLOW_TRACKING_USERNAME=<username>
|
| 168 |
+
MLFLOW_TRACKING_PASSWORD=<password>
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
## 🚀 Usage
|
| 172 |
+
|
| 173 |
+
### Run the Web Application
|
| 174 |
+
|
| 175 |
+
```bash
|
| 176 |
+
python app.py
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
Visit `http://localhost:8000` to access the UI.
|
| 180 |
+
|
| 181 |
+
### Train a New Model
|
| 182 |
+
|
| 183 |
+
To trigger the training pipeline:
|
| 184 |
+
|
| 185 |
+
```bash
|
| 186 |
+
http://localhost:8000/train
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
Or use the "Train New Model" button in the UI.
|
| 190 |
+
|
| 191 |
+
## 📈 Model Performance
|
| 192 |
+
|
| 193 |
+
The system evaluates models using accuracy and F1 score.
|
| 194 |
+
|
| 195 |
+
- **Best Model**: [Automatically selected, typically RandomForest or GradientBoosting]
|
| 196 |
+
- **Recall**: Optimized to minimize false negatives (missing a phishing URL is dangerous).
|
| 197 |
+
|
| 198 |
+
### Model Evaluation Metrics
|
| 199 |
+
|
| 200 |
+
Below are the performance visualizations for the best trained model:
|
| 201 |
+
|
| 202 |
+
#### Confusion Matrix
|
| 203 |
+
|
| 204 |
+

|
| 205 |
+
|
| 206 |
+
#### ROC Curve
|
| 207 |
+
|
| 208 |
+

|
| 209 |
+
|
| 210 |
+
#### Precision-Recall Curve
|
| 211 |
+
|
| 212 |
+

|
| 213 |
+
|
| 214 |
+
## 🧪 Experiment Tracking
|
| 215 |
+
|
| 216 |
+
All runs are logged to DagsHub. You can view parameters, metrics, and models in the MLflow UI.
|
| 217 |
+
|
| 218 |
+
## 🚀 Future Enhancements
|
| 219 |
+
|
| 220 |
+
- [ ] Implement Deep Learning models (LSTM/CNN) for URL text analysis.
|
| 221 |
+
- [ ] Add real-time browser extension.
|
| 222 |
+
- [ ] Deploy serverless architecture.
|
| 223 |
+
- [ ] Add more comprehensive unit and integration tests.
|
| 224 |
+
|
| 225 |
+
## 🤝 Contributing
|
| 226 |
+
|
| 227 |
+
Contributions are welcome! Please fork the repository and create a pull request.
|
| 228 |
+
|
| 229 |
+
1. Fork the Project
|
| 230 |
+
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
| 231 |
+
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
| 232 |
+
4. Push to the Branch (`git push origin feature/AmazingFeature`)
|
| 233 |
+
5. Open a Pull Request
|
| 234 |
+
|
| 235 |
+
## 📄 License
|
| 236 |
+
|
| 237 |
+
Distributed under the MIT License. See `LICENSE` for more information.
|
| 238 |
+
|
| 239 |
+
## 📞 Contact
|
| 240 |
+
|
| 241 |
+
Inder - [GitHub Profile](https://github.com/Inder-26)
|
app.py
CHANGED
|
@@ -37,10 +37,13 @@ MODEL_PATH = "final_model/model.pkl"
|
|
| 37 |
PREPROCESSOR_PATH = "final_model/preprocessor.pkl"
|
| 38 |
|
| 39 |
# Initialize DagsHub
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
# Feature Columns (30 features)
|
| 46 |
FEATURE_COLUMNS = [
|
|
@@ -562,6 +565,9 @@ async def train_model():
|
|
| 562 |
"message": "Training completed successfully"
|
| 563 |
}
|
| 564 |
except Exception as e:
|
|
|
|
|
|
|
|
|
|
| 565 |
raise HTTPException(status_code=500, detail=str(e))
|
| 566 |
|
| 567 |
|
|
|
|
| 37 |
PREPROCESSOR_PATH = "final_model/preprocessor.pkl"
|
| 38 |
|
| 39 |
# Initialize DagsHub
|
| 40 |
+
if os.getenv("MLFLOW_TRACKING_USERNAME") and os.getenv("MLFLOW_TRACKING_PASSWORD"):
|
| 41 |
+
try:
|
| 42 |
+
dagshub.init(repo_owner="Inder-26", repo_name="NetworkSecurity", mlflow=True)
|
| 43 |
+
except Exception as e:
|
| 44 |
+
print(f"⚠️ Error initializing DagsHub: {e}")
|
| 45 |
+
else:
|
| 46 |
+
print("⚠️ DagsHub credentials not found. Skipping initialization.")
|
| 47 |
|
| 48 |
# Feature Columns (30 features)
|
| 49 |
FEATURE_COLUMNS = [
|
|
|
|
| 565 |
"message": "Training completed successfully"
|
| 566 |
}
|
| 567 |
except Exception as e:
|
| 568 |
+
import traceback
|
| 569 |
+
traceback.print_exc()
|
| 570 |
+
print(f"Training Error: {e}", file=sys.stderr)
|
| 571 |
raise HTTPException(status_code=500, detail=str(e))
|
| 572 |
|
| 573 |
|
deploy_stage
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Subproject commit 8a501511f66fe3a9a55300d9484937b7ce289545
|
images/architecture_diagram.png
ADDED
|
Git LFS Details
|
images/confusion_matrix.png
ADDED
|
Git LFS Details
|
images/dagshub_experiments.png
ADDED
|
Git LFS Details
|
images/data_ingestion_diagram.png
ADDED
|
Git LFS Details
|
images/data_transformation_diagram.png
ADDED
|
Git LFS Details
|
images/data_validation_diagram.png
ADDED
|
Git LFS Details
|
images/model_training_diagram.png
ADDED
|
Git LFS Details
|
images/precision_recall_curve.png
ADDED
|
Git LFS Details
|
images/prediction_results.png
ADDED
|
Git LFS Details
|
images/roc_curve.png
ADDED
|
Git LFS Details
|
images/ui_homepage.png
ADDED
|
Git LFS Details
|
networksecurity/components/data_ingestion.py
CHANGED
|
@@ -46,11 +46,11 @@ class DataIngestion:
|
|
| 46 |
import logging
|
| 47 |
logging.info(f"MongoDB unavailable, using sample CSV: {str(e)}")
|
| 48 |
try:
|
| 49 |
-
df = pd.read_csv("
|
| 50 |
logging.info(f" Loaded {len(df)} rows from CSV")
|
| 51 |
return df
|
| 52 |
except FileNotFoundError:
|
| 53 |
-
raise NetworkSecurityException("Sample CSV not found at
|
| 54 |
|
| 55 |
|
| 56 |
def export_data_into_feature_store(self, dataframe: pd.DataFrame):
|
|
|
|
| 46 |
import logging
|
| 47 |
logging.info(f"MongoDB unavailable, using sample CSV: {str(e)}")
|
| 48 |
try:
|
| 49 |
+
df = pd.read_csv("Network_data/phisingData.csv")
|
| 50 |
logging.info(f" Loaded {len(df)} rows from CSV")
|
| 51 |
return df
|
| 52 |
except FileNotFoundError:
|
| 53 |
+
raise NetworkSecurityException("Sample CSV not found at Network_data/phisingData.csv", sys)
|
| 54 |
|
| 55 |
|
| 56 |
def export_data_into_feature_store(self, dataframe: pd.DataFrame):
|
networksecurity/components/model_trainer.py
CHANGED
|
@@ -41,8 +41,15 @@ from sklearn.metrics import (
|
|
| 41 |
precision_recall_curve,
|
| 42 |
)
|
| 43 |
|
|
|
|
|
|
|
| 44 |
# ---------------- Dagshub + MLflow ----------------
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
|
| 48 |
|
|
|
|
| 41 |
precision_recall_curve,
|
| 42 |
)
|
| 43 |
|
| 44 |
+
import os
|
| 45 |
+
|
| 46 |
# ---------------- Dagshub + MLflow ----------------
|
| 47 |
+
if os.getenv("MLFLOW_TRACKING_URI"):
|
| 48 |
+
print("info: MLflow tracking URI is already set, skipping DagsHub init")
|
| 49 |
+
elif os.getenv("MLFLOW_TRACKING_USERNAME") and os.getenv("MLFLOW_TRACKING_PASSWORD"):
|
| 50 |
+
dagshub.init(repo_owner="Inder-26", repo_name="NetworkSecurity", mlflow=True)
|
| 51 |
+
else:
|
| 52 |
+
print("Warning: DagsHub credentials not found. Tracking might rely on local configs or fail.")
|
| 53 |
|
| 54 |
|
| 55 |
|
networksecurity/constant/training_pipeline/__init__.py
CHANGED
|
@@ -14,7 +14,7 @@ FILE_NAME: str = "phisingkData.csv"
|
|
| 14 |
TRAIN_FILE_NAME: str = "train.csv"
|
| 15 |
TEST_FILE_NAME: str = "test.csv"
|
| 16 |
|
| 17 |
-
SCHEMA_FILE_PATH = os.path.join("data_schema", "schema.yaml")
|
| 18 |
|
| 19 |
SAVED_MODEL_DIR_NAME = os.path.join("saved_models")
|
| 20 |
MODEL_FILE_NAME: str = "model.pkl"
|
|
|
|
| 14 |
TRAIN_FILE_NAME: str = "train.csv"
|
| 15 |
TEST_FILE_NAME: str = "test.csv"
|
| 16 |
|
| 17 |
+
SCHEMA_FILE_PATH = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(__file__)))), "data_schema", "schema.yaml")
|
| 18 |
|
| 19 |
SAVED_MODEL_DIR_NAME = os.path.join("saved_models")
|
| 20 |
MODEL_FILE_NAME: str = "model.pkl"
|
networksecurity/pipeline/training_pipeline.py
CHANGED
|
@@ -135,8 +135,8 @@ class TrainingPipeline:
|
|
| 135 |
model_trainer_artifact = self.start_model_trainer(data_transformation_artifact=data_transformation_artifact)
|
| 136 |
logging.info("Training pipeline completed successfully")
|
| 137 |
|
| 138 |
-
self.sync_artifact_dir_to_s3()
|
| 139 |
-
self.sync_saved_model_dir_to_s3()
|
| 140 |
|
| 141 |
return model_trainer_artifact
|
| 142 |
except Exception as e:
|
|
|
|
| 135 |
model_trainer_artifact = self.start_model_trainer(data_transformation_artifact=data_transformation_artifact)
|
| 136 |
logging.info("Training pipeline completed successfully")
|
| 137 |
|
| 138 |
+
# self.sync_artifact_dir_to_s3()
|
| 139 |
+
# self.sync_saved_model_dir_to_s3()
|
| 140 |
|
| 141 |
return model_trainer_artifact
|
| 142 |
except Exception as e:
|
requirements.txt
CHANGED
|
@@ -1,118 +1,25 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
dnspython==1.16.0
|
| 27 |
-
docker==7.1.0
|
| 28 |
-
fastapi==0.128.0
|
| 29 |
-
flask==3.1.2
|
| 30 |
-
flask-cors==6.0.2
|
| 31 |
-
fonttools==4.61.1
|
| 32 |
-
gitdb==4.0.12
|
| 33 |
-
gitpython==3.1.45
|
| 34 |
-
google-auth==2.45.0
|
| 35 |
-
gql==4.0.0
|
| 36 |
-
graphene==3.4.3
|
| 37 |
-
graphql-core==3.2.7
|
| 38 |
-
graphql-relay==3.2.0
|
| 39 |
-
greenlet==3.3.0
|
| 40 |
-
h11==0.16.0
|
| 41 |
-
httpcore==1.0.9
|
| 42 |
-
httpx==0.28.1
|
| 43 |
-
huey==2.5.5
|
| 44 |
-
idna==3.11
|
| 45 |
-
importlib-metadata==8.7.1
|
| 46 |
-
itsdangerous==2.2.0
|
| 47 |
-
jinja2==3.1.6
|
| 48 |
-
jmespath==1.0.1
|
| 49 |
-
joblib==1.5.3
|
| 50 |
-
kiwisolver==1.4.9
|
| 51 |
-
lxml==6.0.2
|
| 52 |
-
mako==1.3.10
|
| 53 |
-
markdown-it-py==4.0.0
|
| 54 |
-
markupsafe==3.0.3
|
| 55 |
-
marshmallow==3.26.2
|
| 56 |
-
matplotlib==3.10.8
|
| 57 |
-
mdurl==0.1.2
|
| 58 |
-
mlflow==3.8.1
|
| 59 |
-
mlflow-skinny==3.8.1
|
| 60 |
-
mlflow-tracing==3.8.1
|
| 61 |
-
multidict==6.7.0
|
| 62 |
-
mypy-extensions==1.1.0
|
| 63 |
-
# -e file:///D:/Coding%20Central/NetworkSecurity
|
| 64 |
-
numpy==2.4.0
|
| 65 |
-
opentelemetry-api==1.39.1
|
| 66 |
-
opentelemetry-proto==1.39.1
|
| 67 |
-
opentelemetry-sdk==1.39.1
|
| 68 |
-
opentelemetry-semantic-conventions==0.60b1
|
| 69 |
-
packaging==25.0
|
| 70 |
-
pandas==2.3.3
|
| 71 |
-
pathvalidate==3.3.1
|
| 72 |
-
pillow==12.0.0
|
| 73 |
-
propcache==0.4.1
|
| 74 |
-
protobuf==6.33.2
|
| 75 |
-
pyaml==25.7.0
|
| 76 |
-
pyarrow==22.0.0
|
| 77 |
-
pyasn1==0.6.1
|
| 78 |
-
pyasn1-modules==0.4.2
|
| 79 |
-
pycparser==2.23
|
| 80 |
-
pydantic==2.12.5
|
| 81 |
-
pydantic-core==2.41.5
|
| 82 |
-
pygments==2.19.2
|
| 83 |
-
pymongo==3.11.0
|
| 84 |
-
pyparsing==3.3.1
|
| 85 |
-
python-dateutil==2.9.0.post0
|
| 86 |
-
python-dotenv==1.2.1
|
| 87 |
-
python-multipart==0.0.21
|
| 88 |
-
pytz==2025.2
|
| 89 |
-
pyyaml==6.0.3
|
| 90 |
-
requests==2.32.5
|
| 91 |
-
requests-toolbelt==1.0.0
|
| 92 |
-
rich==14.2.0
|
| 93 |
-
rsa==4.9.1
|
| 94 |
-
s3transfer==0.16.0
|
| 95 |
-
scikit-learn==1.8.0
|
| 96 |
-
scipy==1.16.3
|
| 97 |
-
seaborn==0.13.2
|
| 98 |
-
semver==3.0.4
|
| 99 |
-
setuptools==80.9.0
|
| 100 |
-
six==1.17.0
|
| 101 |
-
smmap==5.0.2
|
| 102 |
-
sqlalchemy==2.0.45
|
| 103 |
-
sqlparse==0.5.5
|
| 104 |
-
starlette==0.50.0
|
| 105 |
-
tenacity==9.1.2
|
| 106 |
-
threadpoolctl==3.6.0
|
| 107 |
-
treelib==1.8.0
|
| 108 |
-
typing-extensions==4.15.0
|
| 109 |
-
typing-inspect==0.9.0
|
| 110 |
-
typing-inspection==0.4.2
|
| 111 |
-
tzdata==2025.3
|
| 112 |
-
urllib3==2.6.2
|
| 113 |
-
uvicorn==0.40.0
|
| 114 |
-
waitress==3.0.2
|
| 115 |
-
werkzeug==3.1.4
|
| 116 |
-
yarl==1.22.0
|
| 117 |
-
zipp==3.23.0
|
| 118 |
-
aiofiles==23.2.1
|
|
|
|
| 1 |
+
numpy
|
| 2 |
+
pandas
|
| 3 |
+
scikit-learn
|
| 4 |
+
matplotlib
|
| 5 |
+
seaborn
|
| 6 |
+
fastapi
|
| 7 |
+
uvicorn
|
| 8 |
+
pymongo
|
| 9 |
+
python-dotenv
|
| 10 |
+
mlflow
|
| 11 |
+
dagshub
|
| 12 |
+
requests
|
| 13 |
+
boto3
|
| 14 |
+
botocore
|
| 15 |
+
dill
|
| 16 |
+
joblib
|
| 17 |
+
flask
|
| 18 |
+
flask-cors
|
| 19 |
+
jinja2
|
| 20 |
+
markupsafe
|
| 21 |
+
itsdangerous
|
| 22 |
+
werkzeug
|
| 23 |
+
click
|
| 24 |
+
pytest
|
| 25 |
+
python-multipart
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|