Spaces:
Paused
Paused
| # Hugging Face Error Detection | |
| **Purpose**: ML-powered vulnerability scanning and error pattern recognition | |
| **Status**: π‘ Working (performance tuning recommended) | |
| **Version**: 1.0.0+ | |
| **Maintainer**: Security Team (Block 6) | |
| ## What It Does | |
| Hugging Face error detection uses ML models to: | |
| - Identify security vulnerabilities (SQL injection, XSS, etc.) | |
| - Find API misuse patterns | |
| - Detect performance anti-patterns | |
| - Recognize code smell and potential bugs | |
| - Analyze error propagation paths | |
| ## Installation | |
| ```bash | |
| # Already configured in project | |
| # To verify: | |
| python -c "from transformers import pipeline; print('HF installed')" | |
| # To update: | |
| pip install --upgrade transformers torch | |
| # Verify models are cached | |
| ls ~/.cache/huggingface/hub/ | |
| ``` | |
| ## Performance Workaround | |
| ### Problem | |
| Scanning large codebases (>100K lines) takes too long or times out | |
| ### Solution | |
| **Scan by module, not entire project**: | |
| ```bash | |
| # DON'T do this (will timeout) | |
| python tools/error-libraries/huggingface-error-detection/detector.py src/ | |
| # DO this instead - scan module by module | |
| cd C:/Users/claus/Projects/WidgetTDC | |
| python tools/error-libraries/huggingface-error-detection/detector.py src/agents/ | |
| python tools/error-libraries/huggingface-error-detection/detector.py src/services/ | |
| python tools/error-libraries/huggingface-error-detection/detector.py src/routers/ | |
| # Or use parallel processing | |
| parallel -j 3 "timeout 30 python detector.py {}" ::: src/*/ | |
| ``` | |
| ## Quick Start | |
| ### 1. Scan Single Module | |
| ```bash | |
| # Scan one service | |
| cd C:/Users/claus/Projects/WidgetTDC | |
| python tools/error-libraries/huggingface-error-detection/detector.py src/services/email_service.py | |
| # Expected output: | |
| # [CRITICAL] SQL injection vulnerability in query: "SELECT * FROM users WHERE id=" + user_input | |
| # [HIGH] Missing error handler for network timeout | |
| # [MEDIUM] Unvalidated user input in API response | |
| ``` | |
| ### 2. Scan Entire Directory (with timeout) | |
| ```bash | |
| # With enforced 30-second timeout | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/agents/ | |
| # Multiple directories in sequence | |
| for dir in src/agents src/services src/routers; do | |
| echo "Scanning $dir..." | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py "$dir" | |
| done | |
| ``` | |
| ### 3. Generate Report | |
| ```bash | |
| # Scan and save results | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/ > hf-scan-report.txt 2>&1 | |
| # Review report | |
| cat hf-scan-report.txt | |
| # Count vulnerabilities by severity | |
| grep "\[CRITICAL\]" hf-scan-report.txt | wc -l | |
| grep "\[HIGH\]" hf-scan-report.txt | wc -l | |
| grep "\[MEDIUM\]" hf-scan-report.txt | wc -l | |
| ``` | |
| ## Severity Levels | |
| ### π΄ CRITICAL | |
| - SQL injection vulnerabilities | |
| - Authentication bypass | |
| - Credential exposure | |
| - Remote code execution risks | |
| - Data exfiltration paths | |
| **Action**: Fix immediately before any deployment | |
| ### π HIGH | |
| - Missing input validation | |
| - Improper error handling | |
| - Weak cryptography usage | |
| - Privilege escalation paths | |
| - Performance DoS patterns | |
| **Action**: Fix within current sprint | |
| ### π‘ MEDIUM | |
| - Code smell and anti-patterns | |
| - Potential logic errors | |
| - Non-optimal algorithms | |
| - Deprecated API usage | |
| - Type safety issues | |
| **Action**: Schedule for next sprint | |
| ### π’ LOW | |
| - Style improvements | |
| - Documentation suggestions | |
| - Refactoring recommendations | |
| - Performance micro-optimizations | |
| - Best practice violations | |
| **Action**: Consider for future improvement | |
| ## Usage Patterns | |
| ### Pattern 1: Security-Focused Scan | |
| ```bash | |
| # Scan for security issues only | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --security-only \ | |
| src/ | |
| # Output will only show CRITICAL and HIGH severity | |
| ``` | |
| ### Pattern 2: Performance Analysis | |
| ```bash | |
| # Scan for performance anti-patterns | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --performance-check \ | |
| src/services/ | |
| # Identifies O(nΒ²) algorithms, memory leaks, etc. | |
| ``` | |
| ### Pattern 3: API Misuse Detection | |
| ```bash | |
| # Scan for common API misuse patterns | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --api-validation \ | |
| src/integrations/ | |
| # Finds incorrect library usage, wrong parameters, etc. | |
| ``` | |
| ### Pattern 4: Compare Before/After | |
| ```bash | |
| # Baseline scan | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/ > baseline.txt | |
| # Make changes... | |
| # ... edit code ... | |
| # Rescan | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/ > after.txt | |
| # Compare | |
| diff baseline.txt after.txt | |
| ``` | |
| ## Configuration | |
| **File**: `config/error-detection/huggingface-params.json` | |
| ```json | |
| { | |
| "timeout_seconds": 30, | |
| "severity_threshold": "MEDIUM", | |
| "models": [ | |
| "microsoft/codebert-base", | |
| "huggingface/CodeBERTa-small-v1" | |
| ], | |
| "parallel_workers": 3, | |
| "cache_results": true, | |
| "exclude_patterns": [ | |
| "node_modules/", | |
| "dist/", | |
| "*.test.py" | |
| ] | |
| } | |
| ``` | |
| ### Custom Configuration | |
| ```bash | |
| # Use custom params file | |
| python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --config custom-hf-params.json \ | |
| src/ | |
| ``` | |
| ## Common Issues and Solutions | |
| ### Issue: "CUDA out of memory" | |
| **Cause**: GPU memory exhausted by ML model | |
| **Solution**: | |
| ```bash | |
| # Run on CPU instead | |
| CUDA_VISIBLE_DEVICES="" python detector.py src/ | |
| # Or limit batch size | |
| python detector.py --batch-size 16 src/ | |
| ``` | |
| ### Issue: "Timeout after 30 seconds" | |
| **Cause**: Large directory or slow model | |
| **Solution**: | |
| ```bash | |
| # Scan smaller subsets | |
| timeout 30 python detector.py src/agents/ > agents.txt | |
| timeout 30 python detector.py src/services/ > services.txt | |
| # Or increase timeout | |
| timeout 60 python detector.py src/ | |
| ``` | |
| ### Issue: "Model download failed" | |
| **Cause**: First run needs to download model (slow) | |
| **Solution**: | |
| ```bash | |
| # Pre-download model | |
| python -c "from transformers import AutoModel; AutoModel.from_pretrained('microsoft/codebert-base')" | |
| # Then run detector (cached) | |
| python detector.py src/ | |
| ``` | |
| ### Issue: "High false-positive rate" | |
| **Cause**: ML model marking legitimate code as vulnerable | |
| **Solution**: | |
| ```bash | |
| # Review reported vulnerabilities manually | |
| # Update config to set higher threshold | |
| python detector.py --severity-threshold HIGH src/ | |
| # Or exclude specific patterns | |
| python detector.py --exclude-pattern "test_*" src/ | |
| ``` | |
| ## Example Scenarios | |
| ### Scenario 1: Widget Discovery Security Review | |
| **Block**: 5 (QASpecialist) | |
| **Task**: Verify widget discovery from Git/Hugging Face is secure | |
| ```bash | |
| # Scan widget discovery service | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --security-only \ | |
| src/services/widget_discovery.py | |
| # Expected findings: | |
| # - Input validation for URLs | |
| # - Git URL injection protection | |
| # - Model download safety | |
| # - Sandboxing of untrusted code | |
| ``` | |
| ### Scenario 2: MCP Communication Validation | |
| **Block**: 2 (CloudArch) | |
| **Task**: Find vulnerabilities in MCP widget trigger mechanism | |
| ```bash | |
| # Scan MCP integration code | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --security-only \ | |
| src/integrations/mcp_framework.py | |
| # Expected findings: | |
| # - Message injection protection | |
| # - Parameter validation | |
| # - Cross-widget access control | |
| # - State synchronization safety | |
| ``` | |
| ### Scenario 3: Widget Registry Data Safety | |
| **Block**: 4 (DatabaseMaster) | |
| **Task**: Verify widget registry prevents data corruption | |
| ```bash | |
| # Scan database operations | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \ | |
| --api-validation \ | |
| src/models/widget_registry.py | |
| # Expected findings: | |
| # - SQL injection prevention | |
| # - Transaction integrity | |
| # - Concurrent access safety | |
| # - Backup mechanism validation | |
| ``` | |
| ## Integration with Cascade | |
| Hugging Face detector runs: | |
| - On code changes for security-critical paths | |
| - Results logged to: `.claude/logs/security-scan.log` | |
| - CRITICAL findings block deployment | |
| - HIGH findings trigger review | |
| - Results included in daily standup | |
| ## Performance Optimization Tips | |
| ### Tip 1: Use Caching | |
| ```bash | |
| # Results are cached by default | |
| # Second run on same code is instant | |
| # Clear cache if needed | |
| rm -rf .cache/huggingface-detector/ | |
| ``` | |
| ### Tip 2: Parallel Scanning | |
| ```bash | |
| # Install GNU Parallel | |
| pip install parallel | |
| # Scan 3 directories in parallel | |
| parallel -j 3 "timeout 30 python detector.py {}" ::: src/agents src/services src/routers | |
| ``` | |
| ### Tip 3: Incremental Scanning | |
| ```bash | |
| # Only scan files changed since last commit | |
| git diff --name-only | xargs -I {} timeout 30 python detector.py {} | |
| # Or use git hooks for automatic scanning | |
| # (setup in pre-commit hooks) | |
| ``` | |
| ## Success Metrics | |
| **Good HF detection results**: | |
| - β All CRITICAL issues identified | |
| - β Security vulnerabilities clearly reported | |
| - β Actionable fix recommendations | |
| - β Reasonable false-positive rate (<20%) | |
| - β Scans complete within 30 seconds | |
| **Poor results** (need investigation): | |
| - β Scan timeout every run | |
| - β Memory errors or crashes | |
| - β High false-positive rate (>50%) | |
| - β Missing obvious vulnerabilities | |
| - β Cannot download models | |
| ## Troubleshooting Checklist | |
| - [ ] Have 30+ second timeout available? | |
| - [ ] Scanning module, not entire project? | |
| - [ ] Models cached? (first run is slow) | |
| - [ ] Have GPU memory available? (or use CPU) | |
| - [ ] Internet connection for first model download? | |
| - [ ] Sufficient disk space for model cache? | |
| ## Next Steps | |
| 1. **Start small**: Scan one Python file first | |
| 2. **Review findings**: Read security warnings carefully | |
| 3. **Fix issues**: Address CRITICAL findings first | |
| 4. **Rescan**: Confirm fixes worked | |
| 5. **Automate**: Add to CI/CD pipeline | |
| ## Questions? | |
| Escalation path: | |
| 1. Check this README | |
| 2. Try lower severity threshold to exclude false positives | |
| 3. File issue in daily standup with: | |
| - Command you ran | |
| - Error message (if any) | |
| - Your block number | |
| - Timeout issues or false positives encountered | |
| --- | |
| **Ready to use?** | |
| ```bash | |
| cd C:/Users/claus/Projects/WidgetTDC | |
| # Start with one file | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/services/email_service.py | |
| # Then expand to module | |
| timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/services/ | |
| # Review findings and fix | |
| ``` | |
| Security scanning enabled. π | |