Spaces:

Che237
/

cyberforge

Running

App Files Files Community

Che237 commited on Jan 31

Commit

77c5bf8

verified ·

1 Parent(s): aef1511

Add README.md

Browse files

Files changed (1) hide show

notebooks/README.md +183 -0

notebooks/README.md ADDED Viewed

	@@ -0,0 +1,183 @@

+# CyberForge ML Notebooks
+Production-ready ML pipeline for CyberForge cybersecurity AI system.
+## Notebook Structure
+| # | Notebook | Purpose | Key Outputs |
+|---|----------|---------|-------------|
+| 00 | [environment_setup](00_environment_setup.ipynb) | Environment validation, dependencies | System readiness report |
+| 01 | [data_acquisition](01_data_acquisition.ipynb) | Data collection from WebScraper API, HF | Normalized datasets |
+| 02 | [feature_engineering](02_feature_engineering.ipynb) | URL, network, security feature extraction | Feature-engineered data |
+| 03 | [model_training](03_model_training.ipynb) | Train detection models | Trained .pkl models |
+| 04 | [agent_intelligence](04_agent_intelligence.ipynb) | Decision scoring, Gemini integration | Agent module |
+| 05 | [model_validation](05_model_validation.ipynb) | Performance, edge case testing | Validation report |
+| 06 | [backend_integration](06_backend_integration.ipynb) | API packaging, serialization | Backend package |
+| 07 | [deployment_artifacts](07_deployment_artifacts.ipynb) | Docker, HF upload, documentation | Deployment package |
+## Quick Start
+1. **Configure environment:**
+   ```bash
+   cd ml-services
+   # Ensure notebook_config.json has your API keys
+   ```
+2. **Run notebooks in order:**
+   ```bash
+   jupyter notebook notebooks/00_environment_setup.ipynb
+   ```
+3. **Or run all:**
+   ```bash
+   jupyter nbconvert --execute --to notebook notebooks/*.ipynb
+   ```
+## Configuration
+All notebooks use `../notebook_config.json` for configuration:
+```json
+{
+  "datasets_dir": "../datasets",
+  "hf_repo": "Che237/cyberforge-models",
+  "gemini_api_key": "",
+  "webscraper_api_key": "your_key"
+}
+```
+## Output Directories
+After running all notebooks:
+```
+ml-services/
+├── datasets/
+│   ├── processed/       # Cleaned datasets
+│   └── features/        # Feature-engineered data
+├── models/              # Trained models
+│   ├── phishing_detection/
+│   ├── malware_detection/
+│   └── model_registry.json
+├── agent/               # Agent intelligence module
+├── validation/          # Validation reports
+├── backend_package/     # Backend integration files
+└── deployment/          # Deployment artifacts
+```
+## Integration Points
+### Backend (mlService.js)
+- Use `backend_package/inference.py` or `backend_package/ml_client.js`
+- Prediction endpoint: `POST /predict`
+### Desktop App (caido-app.js)
+- Agent module: `agent/cyberforge_agent.py`
+- Real-time analysis via backend API
+### Hugging Face
+- Models: `huggingface.co/Che237/cyberforge-models`
+- Datasets: `huggingface.co/datasets/Che237/cyberforge-datasets`
+- Space: `huggingface.co/spaces/Che237/cyberforge`
+## Requirements
+- Python 3.11+
+- scikit-learn >= 1.3.0
+- pandas >= 2.0.0
+- huggingface_hub >= 0.19.0
+- google-generativeai >= 0.3.0
+## License
+MIT
+### 3. **Network Security Analysis** 🌐
+**File**: `network_security_analysis.ipynb`
+**Purpose**: Network-specific security analysis and monitoring
+**Runtime**: ~20-30 minutes
+**Description**:
+- Network traffic analysis
+- Intrusion detection model training
+- Port scanning detection
+- Network anomaly detection
+```bash
+jupyter notebook network_security_analysis.ipynb
+```
+### 4. **Comprehensive AI Agent Training** 🤖
+**File**: `ai_agent_comprehensive_training.ipynb`
+**Purpose**: Advanced AI agent with full capabilities
+**Runtime**: ~45-60 minutes
+**Description**:
+- Enhanced communication skills
+- Web scraping and threat intelligence
+- Real-time monitoring capabilities
+- Natural language processing for security analysis
+- **RUN LAST** - Integrates all previous models
+```bash
+jupyter notebook ai_agent_comprehensive_training.ipynb
+```
+## 📊 Expected Outputs
+After running all notebooks, you should have:
+1. **Trained Models**: Saved in `../models/` directory
+2. **Performance Metrics**: Evaluation reports and visualizations
+3. **AI Agent**: Fully trained agent ready for deployment
+4. **Configuration Files**: Model configs for production use
+## 🔧 Troubleshooting
+### Common Issues:
+**Memory Errors**:
+- Reduce batch size in deep learning models
+- Close other applications to free RAM
+- Consider using smaller datasets for testing
+**Package Installation Failures**:
+- Update pip: `pip install --upgrade pip`
+- Use conda if pip fails: `conda install <package>`
+- Check Python version compatibility
+**CUDA/GPU Issues**:
+- For TensorFlow GPU: Install CUDA 11.8+ and cuDNN
+- For CPU-only: Models will run slower but still work
+- Check GPU availability: `tensorflow.test.is_gpu_available()`
+**Data Download Issues**:
+- Ensure internet connection for Kaggle datasets
+- Set up Kaggle API credentials if needed
+- Some notebooks include fallback synthetic data generation
+## 📝 Notes
+- **First Run**: Initial execution takes longer due to package installation and data downloads
+- **Subsequent Runs**: Much faster as dependencies are cached
+- **Customization**: Modify hyperparameters in notebooks for different results
+- **Production**: Use the saved models in the main application
+## 🎯 Next Steps
+After completing all notebooks:
+1. **Deploy Models**: Copy trained models to production environment
+2. **Integration**: Connect models with the desktop application
+3. **Monitoring**: Set up model performance monitoring
+4. **Updates**: Retrain models with new data periodically
+## 🆘 Support
+If you encounter issues:
+1. Check the troubleshooting section above
+2. Verify all prerequisites are met
+3. Review notebook outputs for specific error messages
+4. Create an issue in the repository with error details
+---
+**Happy Training! 🚀**