Spaces:
Running
Running
CyberForge ML Notebooks
Production-ready ML pipeline for CyberForge cybersecurity AI system.
Notebook Structure
| # | Notebook | Purpose | Key Outputs |
|---|---|---|---|
| 00 | environment_setup | Environment validation, dependencies | System readiness report |
| 01 | data_acquisition | Data collection from WebScraper API, HF | Normalized datasets |
| 02 | feature_engineering | URL, network, security feature extraction | Feature-engineered data |
| 03 | model_training | Train detection models | Trained .pkl models |
| 04 | agent_intelligence | Decision scoring, Gemini integration | Agent module |
| 05 | model_validation | Performance, edge case testing | Validation report |
| 06 | backend_integration | API packaging, serialization | Backend package |
| 07 | deployment_artifacts | Docker, HF upload, documentation | Deployment package |
Quick Start
Configure environment:
cd ml-services # Ensure notebook_config.json has your API keysRun notebooks in order:
jupyter notebook notebooks/00_environment_setup.ipynbOr run all:
jupyter nbconvert --execute --to notebook notebooks/*.ipynb
Configuration
All notebooks use ../notebook_config.json for configuration:
{
"datasets_dir": "../datasets",
"hf_repo": "Che237/cyberforge-models",
"gemini_api_key": "",
"webscraper_api_key": "your_key"
}
Output Directories
After running all notebooks:
ml-services/
βββ datasets/
β βββ processed/ # Cleaned datasets
β βββ features/ # Feature-engineered data
βββ models/ # Trained models
β βββ phishing_detection/
β βββ malware_detection/
β βββ model_registry.json
βββ agent/ # Agent intelligence module
βββ validation/ # Validation reports
βββ backend_package/ # Backend integration files
βββ deployment/ # Deployment artifacts
Integration Points
Backend (mlService.js)
- Use
backend_package/inference.pyorbackend_package/ml_client.js - Prediction endpoint:
POST /predict
Desktop App (caido-app.js)
- Agent module:
agent/cyberforge_agent.py - Real-time analysis via backend API
Hugging Face
- Models:
huggingface.co/Che237/cyberforge-models - Datasets:
huggingface.co/datasets/Che237/cyberforge-datasets - Space:
huggingface.co/spaces/Che237/cyberforge
Requirements
- Python 3.11+
- scikit-learn >= 1.3.0
- pandas >= 2.0.0
- huggingface_hub >= 0.19.0
- google-generativeai >= 0.3.0
License
MIT
3. Network Security Analysis π
File: network_security_analysis.ipynb
Purpose: Network-specific security analysis and monitoring
Runtime: ~20-30 minutes
Description:
- Network traffic analysis
- Intrusion detection model training
- Port scanning detection
- Network anomaly detection
jupyter notebook network_security_analysis.ipynb
4. Comprehensive AI Agent Training π€
File: ai_agent_comprehensive_training.ipynb
Purpose: Advanced AI agent with full capabilities
Runtime: ~45-60 minutes
Description:
- Enhanced communication skills
- Web scraping and threat intelligence
- Real-time monitoring capabilities
- Natural language processing for security analysis
- RUN LAST - Integrates all previous models
jupyter notebook ai_agent_comprehensive_training.ipynb
π Expected Outputs
After running all notebooks, you should have:
- Trained Models: Saved in
../models/directory - Performance Metrics: Evaluation reports and visualizations
- AI Agent: Fully trained agent ready for deployment
- Configuration Files: Model configs for production use
π§ Troubleshooting
Common Issues:
Memory Errors:
- Reduce batch size in deep learning models
- Close other applications to free RAM
- Consider using smaller datasets for testing
Package Installation Failures:
- Update pip:
pip install --upgrade pip - Use conda if pip fails:
conda install <package> - Check Python version compatibility
CUDA/GPU Issues:
- For TensorFlow GPU: Install CUDA 11.8+ and cuDNN
- For CPU-only: Models will run slower but still work
- Check GPU availability:
tensorflow.test.is_gpu_available()
Data Download Issues:
- Ensure internet connection for Kaggle datasets
- Set up Kaggle API credentials if needed
- Some notebooks include fallback synthetic data generation
π Notes
- First Run: Initial execution takes longer due to package installation and data downloads
- Subsequent Runs: Much faster as dependencies are cached
- Customization: Modify hyperparameters in notebooks for different results
- Production: Use the saved models in the main application
π― Next Steps
After completing all notebooks:
- Deploy Models: Copy trained models to production environment
- Integration: Connect models with the desktop application
- Monitoring: Set up model performance monitoring
- Updates: Retrain models with new data periodically
π Support
If you encounter issues:
- Check the troubleshooting section above
- Verify all prerequisites are met
- Review notebook outputs for specific error messages
- Create an issue in the repository with error details
Happy Training! π