Spaces:
Running
Running
| # CyberForge ML Notebooks | |
| Production-ready ML pipeline for CyberForge cybersecurity AI system. | |
| ## Notebook Structure | |
| | # | Notebook | Purpose | Key Outputs | | |
| |---|----------|---------|-------------| | |
| | 00 | [environment_setup](00_environment_setup.ipynb) | Environment validation, dependencies | System readiness report | | |
| | 01 | [data_acquisition](01_data_acquisition.ipynb) | Data collection from WebScraper API, HF | Normalized datasets | | |
| | 02 | [feature_engineering](02_feature_engineering.ipynb) | URL, network, security feature extraction | Feature-engineered data | | |
| | 03 | [model_training](03_model_training.ipynb) | Train detection models | Trained .pkl models | | |
| | 04 | [agent_intelligence](04_agent_intelligence.ipynb) | Decision scoring, Gemini integration | Agent module | | |
| | 05 | [model_validation](05_model_validation.ipynb) | Performance, edge case testing | Validation report | | |
| | 06 | [backend_integration](06_backend_integration.ipynb) | API packaging, serialization | Backend package | | |
| | 07 | [deployment_artifacts](07_deployment_artifacts.ipynb) | Docker, HF upload, documentation | Deployment package | | |
| ## Quick Start | |
| 1. **Configure environment:** | |
| ```bash | |
| cd ml-services | |
| # Ensure notebook_config.json has your API keys | |
| ``` | |
| 2. **Run notebooks in order:** | |
| ```bash | |
| jupyter notebook notebooks/00_environment_setup.ipynb | |
| ``` | |
| 3. **Or run all:** | |
| ```bash | |
| jupyter nbconvert --execute --to notebook notebooks/*.ipynb | |
| ``` | |
| ## Configuration | |
| All notebooks use `../notebook_config.json` for configuration: | |
| ```json | |
| { | |
| "datasets_dir": "../datasets", | |
| "hf_repo": "Che237/cyberforge-models", | |
| "gemini_api_key": "", | |
| "webscraper_api_key": "your_key" | |
| } | |
| ``` | |
| ## Output Directories | |
| After running all notebooks: | |
| ``` | |
| ml-services/ | |
| βββ datasets/ | |
| β βββ processed/ # Cleaned datasets | |
| β βββ features/ # Feature-engineered data | |
| βββ models/ # Trained models | |
| β βββ phishing_detection/ | |
| β βββ malware_detection/ | |
| β βββ model_registry.json | |
| βββ agent/ # Agent intelligence module | |
| βββ validation/ # Validation reports | |
| βββ backend_package/ # Backend integration files | |
| βββ deployment/ # Deployment artifacts | |
| ``` | |
| ## Integration Points | |
| ### Backend (mlService.js) | |
| - Use `backend_package/inference.py` or `backend_package/ml_client.js` | |
| - Prediction endpoint: `POST /predict` | |
| ### Desktop App (caido-app.js) | |
| - Agent module: `agent/cyberforge_agent.py` | |
| - Real-time analysis via backend API | |
| ### Hugging Face | |
| - Models: `huggingface.co/Che237/cyberforge-models` | |
| - Datasets: `huggingface.co/datasets/Che237/cyberforge-datasets` | |
| - Space: `huggingface.co/spaces/Che237/cyberforge` | |
| ## Requirements | |
| - Python 3.11+ | |
| - scikit-learn >= 1.3.0 | |
| - pandas >= 2.0.0 | |
| - huggingface_hub >= 0.19.0 | |
| - google-generativeai >= 0.3.0 | |
| ## License | |
| MIT | |
| ### 3. **Network Security Analysis** π | |
| **File**: `network_security_analysis.ipynb` | |
| **Purpose**: Network-specific security analysis and monitoring | |
| **Runtime**: ~20-30 minutes | |
| **Description**: | |
| - Network traffic analysis | |
| - Intrusion detection model training | |
| - Port scanning detection | |
| - Network anomaly detection | |
| ```bash | |
| jupyter notebook network_security_analysis.ipynb | |
| ``` | |
| ### 4. **Comprehensive AI Agent Training** π€ | |
| **File**: `ai_agent_comprehensive_training.ipynb` | |
| **Purpose**: Advanced AI agent with full capabilities | |
| **Runtime**: ~45-60 minutes | |
| **Description**: | |
| - Enhanced communication skills | |
| - Web scraping and threat intelligence | |
| - Real-time monitoring capabilities | |
| - Natural language processing for security analysis | |
| - **RUN LAST** - Integrates all previous models | |
| ```bash | |
| jupyter notebook ai_agent_comprehensive_training.ipynb | |
| ``` | |
| ## π Expected Outputs | |
| After running all notebooks, you should have: | |
| 1. **Trained Models**: Saved in `../models/` directory | |
| 2. **Performance Metrics**: Evaluation reports and visualizations | |
| 3. **AI Agent**: Fully trained agent ready for deployment | |
| 4. **Configuration Files**: Model configs for production use | |
| ## π§ Troubleshooting | |
| ### Common Issues: | |
| **Memory Errors**: | |
| - Reduce batch size in deep learning models | |
| - Close other applications to free RAM | |
| - Consider using smaller datasets for testing | |
| **Package Installation Failures**: | |
| - Update pip: `pip install --upgrade pip` | |
| - Use conda if pip fails: `conda install <package>` | |
| - Check Python version compatibility | |
| **CUDA/GPU Issues**: | |
| - For TensorFlow GPU: Install CUDA 11.8+ and cuDNN | |
| - For CPU-only: Models will run slower but still work | |
| - Check GPU availability: `tensorflow.test.is_gpu_available()` | |
| **Data Download Issues**: | |
| - Ensure internet connection for Kaggle datasets | |
| - Set up Kaggle API credentials if needed | |
| - Some notebooks include fallback synthetic data generation | |
| ## π Notes | |
| - **First Run**: Initial execution takes longer due to package installation and data downloads | |
| - **Subsequent Runs**: Much faster as dependencies are cached | |
| - **Customization**: Modify hyperparameters in notebooks for different results | |
| - **Production**: Use the saved models in the main application | |
| ## π― Next Steps | |
| After completing all notebooks: | |
| 1. **Deploy Models**: Copy trained models to production environment | |
| 2. **Integration**: Connect models with the desktop application | |
| 3. **Monitoring**: Set up model performance monitoring | |
| 4. **Updates**: Retrain models with new data periodically | |
| ## π Support | |
| If you encounter issues: | |
| 1. Check the troubleshooting section above | |
| 2. Verify all prerequisites are met | |
| 3. Review notebook outputs for specific error messages | |
| 4. Create an issue in the repository with error details | |
| --- | |
| **Happy Training! π** |