Spaces:
Running
Running
File size: 5,661 Bytes
77c5bf8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
# CyberForge ML Notebooks
Production-ready ML pipeline for CyberForge cybersecurity AI system.
## Notebook Structure
| # | Notebook | Purpose | Key Outputs |
|---|----------|---------|-------------|
| 00 | [environment_setup](00_environment_setup.ipynb) | Environment validation, dependencies | System readiness report |
| 01 | [data_acquisition](01_data_acquisition.ipynb) | Data collection from WebScraper API, HF | Normalized datasets |
| 02 | [feature_engineering](02_feature_engineering.ipynb) | URL, network, security feature extraction | Feature-engineered data |
| 03 | [model_training](03_model_training.ipynb) | Train detection models | Trained .pkl models |
| 04 | [agent_intelligence](04_agent_intelligence.ipynb) | Decision scoring, Gemini integration | Agent module |
| 05 | [model_validation](05_model_validation.ipynb) | Performance, edge case testing | Validation report |
| 06 | [backend_integration](06_backend_integration.ipynb) | API packaging, serialization | Backend package |
| 07 | [deployment_artifacts](07_deployment_artifacts.ipynb) | Docker, HF upload, documentation | Deployment package |
## Quick Start
1. **Configure environment:**
```bash
cd ml-services
# Ensure notebook_config.json has your API keys
```
2. **Run notebooks in order:**
```bash
jupyter notebook notebooks/00_environment_setup.ipynb
```
3. **Or run all:**
```bash
jupyter nbconvert --execute --to notebook notebooks/*.ipynb
```
## Configuration
All notebooks use `../notebook_config.json` for configuration:
```json
{
"datasets_dir": "../datasets",
"hf_repo": "Che237/cyberforge-models",
"gemini_api_key": "",
"webscraper_api_key": "your_key"
}
```
## Output Directories
After running all notebooks:
```
ml-services/
βββ datasets/
β βββ processed/ # Cleaned datasets
β βββ features/ # Feature-engineered data
βββ models/ # Trained models
β βββ phishing_detection/
β βββ malware_detection/
β βββ model_registry.json
βββ agent/ # Agent intelligence module
βββ validation/ # Validation reports
βββ backend_package/ # Backend integration files
βββ deployment/ # Deployment artifacts
```
## Integration Points
### Backend (mlService.js)
- Use `backend_package/inference.py` or `backend_package/ml_client.js`
- Prediction endpoint: `POST /predict`
### Desktop App (caido-app.js)
- Agent module: `agent/cyberforge_agent.py`
- Real-time analysis via backend API
### Hugging Face
- Models: `huggingface.co/Che237/cyberforge-models`
- Datasets: `huggingface.co/datasets/Che237/cyberforge-datasets`
- Space: `huggingface.co/spaces/Che237/cyberforge`
## Requirements
- Python 3.11+
- scikit-learn >= 1.3.0
- pandas >= 2.0.0
- huggingface_hub >= 0.19.0
- google-generativeai >= 0.3.0
## License
MIT
### 3. **Network Security Analysis** π
**File**: `network_security_analysis.ipynb`
**Purpose**: Network-specific security analysis and monitoring
**Runtime**: ~20-30 minutes
**Description**:
- Network traffic analysis
- Intrusion detection model training
- Port scanning detection
- Network anomaly detection
```bash
jupyter notebook network_security_analysis.ipynb
```
### 4. **Comprehensive AI Agent Training** π€
**File**: `ai_agent_comprehensive_training.ipynb`
**Purpose**: Advanced AI agent with full capabilities
**Runtime**: ~45-60 minutes
**Description**:
- Enhanced communication skills
- Web scraping and threat intelligence
- Real-time monitoring capabilities
- Natural language processing for security analysis
- **RUN LAST** - Integrates all previous models
```bash
jupyter notebook ai_agent_comprehensive_training.ipynb
```
## π Expected Outputs
After running all notebooks, you should have:
1. **Trained Models**: Saved in `../models/` directory
2. **Performance Metrics**: Evaluation reports and visualizations
3. **AI Agent**: Fully trained agent ready for deployment
4. **Configuration Files**: Model configs for production use
## π§ Troubleshooting
### Common Issues:
**Memory Errors**:
- Reduce batch size in deep learning models
- Close other applications to free RAM
- Consider using smaller datasets for testing
**Package Installation Failures**:
- Update pip: `pip install --upgrade pip`
- Use conda if pip fails: `conda install <package>`
- Check Python version compatibility
**CUDA/GPU Issues**:
- For TensorFlow GPU: Install CUDA 11.8+ and cuDNN
- For CPU-only: Models will run slower but still work
- Check GPU availability: `tensorflow.test.is_gpu_available()`
**Data Download Issues**:
- Ensure internet connection for Kaggle datasets
- Set up Kaggle API credentials if needed
- Some notebooks include fallback synthetic data generation
## π Notes
- **First Run**: Initial execution takes longer due to package installation and data downloads
- **Subsequent Runs**: Much faster as dependencies are cached
- **Customization**: Modify hyperparameters in notebooks for different results
- **Production**: Use the saved models in the main application
## π― Next Steps
After completing all notebooks:
1. **Deploy Models**: Copy trained models to production environment
2. **Integration**: Connect models with the desktop application
3. **Monitoring**: Set up model performance monitoring
4. **Updates**: Retrain models with new data periodically
## π Support
If you encounter issues:
1. Check the troubleshooting section above
2. Verify all prerequisites are met
3. Review notebook outputs for specific error messages
4. Create an issue in the repository with error details
---
**Happy Training! π** |