Che237 commited on
Commit
77c5bf8
Β·
verified Β·
1 Parent(s): aef1511

Add README.md

Browse files
Files changed (1) hide show
  1. notebooks/README.md +183 -0
notebooks/README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CyberForge ML Notebooks
2
+
3
+ Production-ready ML pipeline for CyberForge cybersecurity AI system.
4
+
5
+ ## Notebook Structure
6
+
7
+ | # | Notebook | Purpose | Key Outputs |
8
+ |---|----------|---------|-------------|
9
+ | 00 | [environment_setup](00_environment_setup.ipynb) | Environment validation, dependencies | System readiness report |
10
+ | 01 | [data_acquisition](01_data_acquisition.ipynb) | Data collection from WebScraper API, HF | Normalized datasets |
11
+ | 02 | [feature_engineering](02_feature_engineering.ipynb) | URL, network, security feature extraction | Feature-engineered data |
12
+ | 03 | [model_training](03_model_training.ipynb) | Train detection models | Trained .pkl models |
13
+ | 04 | [agent_intelligence](04_agent_intelligence.ipynb) | Decision scoring, Gemini integration | Agent module |
14
+ | 05 | [model_validation](05_model_validation.ipynb) | Performance, edge case testing | Validation report |
15
+ | 06 | [backend_integration](06_backend_integration.ipynb) | API packaging, serialization | Backend package |
16
+ | 07 | [deployment_artifacts](07_deployment_artifacts.ipynb) | Docker, HF upload, documentation | Deployment package |
17
+
18
+ ## Quick Start
19
+
20
+ 1. **Configure environment:**
21
+ ```bash
22
+ cd ml-services
23
+ # Ensure notebook_config.json has your API keys
24
+ ```
25
+
26
+ 2. **Run notebooks in order:**
27
+ ```bash
28
+ jupyter notebook notebooks/00_environment_setup.ipynb
29
+ ```
30
+
31
+ 3. **Or run all:**
32
+ ```bash
33
+ jupyter nbconvert --execute --to notebook notebooks/*.ipynb
34
+ ```
35
+
36
+ ## Configuration
37
+
38
+ All notebooks use `../notebook_config.json` for configuration:
39
+
40
+ ```json
41
+ {
42
+ "datasets_dir": "../datasets",
43
+ "hf_repo": "Che237/cyberforge-models",
44
+ "gemini_api_key": "",
45
+ "webscraper_api_key": "your_key"
46
+ }
47
+ ```
48
+
49
+ ## Output Directories
50
+
51
+ After running all notebooks:
52
+
53
+ ```
54
+ ml-services/
55
+ β”œβ”€β”€ datasets/
56
+ β”‚ β”œβ”€β”€ processed/ # Cleaned datasets
57
+ β”‚ └── features/ # Feature-engineered data
58
+ β”œβ”€β”€ models/ # Trained models
59
+ β”‚ β”œβ”€β”€ phishing_detection/
60
+ β”‚ β”œβ”€β”€ malware_detection/
61
+ β”‚ └── model_registry.json
62
+ β”œβ”€β”€ agent/ # Agent intelligence module
63
+ β”œβ”€β”€ validation/ # Validation reports
64
+ β”œβ”€β”€ backend_package/ # Backend integration files
65
+ └── deployment/ # Deployment artifacts
66
+ ```
67
+
68
+ ## Integration Points
69
+
70
+ ### Backend (mlService.js)
71
+ - Use `backend_package/inference.py` or `backend_package/ml_client.js`
72
+ - Prediction endpoint: `POST /predict`
73
+
74
+ ### Desktop App (caido-app.js)
75
+ - Agent module: `agent/cyberforge_agent.py`
76
+ - Real-time analysis via backend API
77
+
78
+ ### Hugging Face
79
+ - Models: `huggingface.co/Che237/cyberforge-models`
80
+ - Datasets: `huggingface.co/datasets/Che237/cyberforge-datasets`
81
+ - Space: `huggingface.co/spaces/Che237/cyberforge`
82
+
83
+ ## Requirements
84
+
85
+ - Python 3.11+
86
+ - scikit-learn >= 1.3.0
87
+ - pandas >= 2.0.0
88
+ - huggingface_hub >= 0.19.0
89
+ - google-generativeai >= 0.3.0
90
+
91
+ ## License
92
+
93
+ MIT
94
+
95
+ ### 3. **Network Security Analysis** 🌐
96
+ **File**: `network_security_analysis.ipynb`
97
+ **Purpose**: Network-specific security analysis and monitoring
98
+ **Runtime**: ~20-30 minutes
99
+ **Description**:
100
+ - Network traffic analysis
101
+ - Intrusion detection model training
102
+ - Port scanning detection
103
+ - Network anomaly detection
104
+
105
+ ```bash
106
+ jupyter notebook network_security_analysis.ipynb
107
+ ```
108
+
109
+ ### 4. **Comprehensive AI Agent Training** πŸ€–
110
+ **File**: `ai_agent_comprehensive_training.ipynb`
111
+ **Purpose**: Advanced AI agent with full capabilities
112
+ **Runtime**: ~45-60 minutes
113
+ **Description**:
114
+ - Enhanced communication skills
115
+ - Web scraping and threat intelligence
116
+ - Real-time monitoring capabilities
117
+ - Natural language processing for security analysis
118
+ - **RUN LAST** - Integrates all previous models
119
+
120
+ ```bash
121
+ jupyter notebook ai_agent_comprehensive_training.ipynb
122
+ ```
123
+
124
+ ## πŸ“Š Expected Outputs
125
+
126
+ After running all notebooks, you should have:
127
+
128
+ 1. **Trained Models**: Saved in `../models/` directory
129
+ 2. **Performance Metrics**: Evaluation reports and visualizations
130
+ 3. **AI Agent**: Fully trained agent ready for deployment
131
+ 4. **Configuration Files**: Model configs for production use
132
+
133
+ ## πŸ”§ Troubleshooting
134
+
135
+ ### Common Issues:
136
+
137
+ **Memory Errors**:
138
+ - Reduce batch size in deep learning models
139
+ - Close other applications to free RAM
140
+ - Consider using smaller datasets for testing
141
+
142
+ **Package Installation Failures**:
143
+ - Update pip: `pip install --upgrade pip`
144
+ - Use conda if pip fails: `conda install <package>`
145
+ - Check Python version compatibility
146
+
147
+ **CUDA/GPU Issues**:
148
+ - For TensorFlow GPU: Install CUDA 11.8+ and cuDNN
149
+ - For CPU-only: Models will run slower but still work
150
+ - Check GPU availability: `tensorflow.test.is_gpu_available()`
151
+
152
+ **Data Download Issues**:
153
+ - Ensure internet connection for Kaggle datasets
154
+ - Set up Kaggle API credentials if needed
155
+ - Some notebooks include fallback synthetic data generation
156
+
157
+ ## πŸ“ Notes
158
+
159
+ - **First Run**: Initial execution takes longer due to package installation and data downloads
160
+ - **Subsequent Runs**: Much faster as dependencies are cached
161
+ - **Customization**: Modify hyperparameters in notebooks for different results
162
+ - **Production**: Use the saved models in the main application
163
+
164
+ ## 🎯 Next Steps
165
+
166
+ After completing all notebooks:
167
+
168
+ 1. **Deploy Models**: Copy trained models to production environment
169
+ 2. **Integration**: Connect models with the desktop application
170
+ 3. **Monitoring**: Set up model performance monitoring
171
+ 4. **Updates**: Retrain models with new data periodically
172
+
173
+ ## πŸ†˜ Support
174
+
175
+ If you encounter issues:
176
+ 1. Check the troubleshooting section above
177
+ 2. Verify all prerequisites are met
178
+ 3. Review notebook outputs for specific error messages
179
+ 4. Create an issue in the repository with error details
180
+
181
+ ---
182
+
183
+ **Happy Training! πŸš€**