Spaces:
Paused
Paused
File size: 10,313 Bytes
5a81b95 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | # Hugging Face Error Detection
**Purpose**: ML-powered vulnerability scanning and error pattern recognition
**Status**: π‘ Working (performance tuning recommended)
**Version**: 1.0.0+
**Maintainer**: Security Team (Block 6)
## What It Does
Hugging Face error detection uses ML models to:
- Identify security vulnerabilities (SQL injection, XSS, etc.)
- Find API misuse patterns
- Detect performance anti-patterns
- Recognize code smell and potential bugs
- Analyze error propagation paths
## Installation
```bash
# Already configured in project
# To verify:
python -c "from transformers import pipeline; print('HF installed')"
# To update:
pip install --upgrade transformers torch
# Verify models are cached
ls ~/.cache/huggingface/hub/
```
## Performance Workaround
### Problem
Scanning large codebases (>100K lines) takes too long or times out
### Solution
**Scan by module, not entire project**:
```bash
# DON'T do this (will timeout)
python tools/error-libraries/huggingface-error-detection/detector.py src/
# DO this instead - scan module by module
cd C:/Users/claus/Projects/WidgetTDC
python tools/error-libraries/huggingface-error-detection/detector.py src/agents/
python tools/error-libraries/huggingface-error-detection/detector.py src/services/
python tools/error-libraries/huggingface-error-detection/detector.py src/routers/
# Or use parallel processing
parallel -j 3 "timeout 30 python detector.py {}" ::: src/*/
```
## Quick Start
### 1. Scan Single Module
```bash
# Scan one service
cd C:/Users/claus/Projects/WidgetTDC
python tools/error-libraries/huggingface-error-detection/detector.py src/services/email_service.py
# Expected output:
# [CRITICAL] SQL injection vulnerability in query: "SELECT * FROM users WHERE id=" + user_input
# [HIGH] Missing error handler for network timeout
# [MEDIUM] Unvalidated user input in API response
```
### 2. Scan Entire Directory (with timeout)
```bash
# With enforced 30-second timeout
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/agents/
# Multiple directories in sequence
for dir in src/agents src/services src/routers; do
echo "Scanning $dir..."
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py "$dir"
done
```
### 3. Generate Report
```bash
# Scan and save results
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/ > hf-scan-report.txt 2>&1
# Review report
cat hf-scan-report.txt
# Count vulnerabilities by severity
grep "\[CRITICAL\]" hf-scan-report.txt | wc -l
grep "\[HIGH\]" hf-scan-report.txt | wc -l
grep "\[MEDIUM\]" hf-scan-report.txt | wc -l
```
## Severity Levels
### π΄ CRITICAL
- SQL injection vulnerabilities
- Authentication bypass
- Credential exposure
- Remote code execution risks
- Data exfiltration paths
**Action**: Fix immediately before any deployment
### π HIGH
- Missing input validation
- Improper error handling
- Weak cryptography usage
- Privilege escalation paths
- Performance DoS patterns
**Action**: Fix within current sprint
### π‘ MEDIUM
- Code smell and anti-patterns
- Potential logic errors
- Non-optimal algorithms
- Deprecated API usage
- Type safety issues
**Action**: Schedule for next sprint
### π’ LOW
- Style improvements
- Documentation suggestions
- Refactoring recommendations
- Performance micro-optimizations
- Best practice violations
**Action**: Consider for future improvement
## Usage Patterns
### Pattern 1: Security-Focused Scan
```bash
# Scan for security issues only
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \
--security-only \
src/
# Output will only show CRITICAL and HIGH severity
```
### Pattern 2: Performance Analysis
```bash
# Scan for performance anti-patterns
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \
--performance-check \
src/services/
# Identifies O(nΒ²) algorithms, memory leaks, etc.
```
### Pattern 3: API Misuse Detection
```bash
# Scan for common API misuse patterns
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \
--api-validation \
src/integrations/
# Finds incorrect library usage, wrong parameters, etc.
```
### Pattern 4: Compare Before/After
```bash
# Baseline scan
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/ > baseline.txt
# Make changes...
# ... edit code ...
# Rescan
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/ > after.txt
# Compare
diff baseline.txt after.txt
```
## Configuration
**File**: `config/error-detection/huggingface-params.json`
```json
{
"timeout_seconds": 30,
"severity_threshold": "MEDIUM",
"models": [
"microsoft/codebert-base",
"huggingface/CodeBERTa-small-v1"
],
"parallel_workers": 3,
"cache_results": true,
"exclude_patterns": [
"node_modules/",
"dist/",
"*.test.py"
]
}
```
### Custom Configuration
```bash
# Use custom params file
python tools/error-libraries/huggingface-error-detection/detector.py \
--config custom-hf-params.json \
src/
```
## Common Issues and Solutions
### Issue: "CUDA out of memory"
**Cause**: GPU memory exhausted by ML model
**Solution**:
```bash
# Run on CPU instead
CUDA_VISIBLE_DEVICES="" python detector.py src/
# Or limit batch size
python detector.py --batch-size 16 src/
```
### Issue: "Timeout after 30 seconds"
**Cause**: Large directory or slow model
**Solution**:
```bash
# Scan smaller subsets
timeout 30 python detector.py src/agents/ > agents.txt
timeout 30 python detector.py src/services/ > services.txt
# Or increase timeout
timeout 60 python detector.py src/
```
### Issue: "Model download failed"
**Cause**: First run needs to download model (slow)
**Solution**:
```bash
# Pre-download model
python -c "from transformers import AutoModel; AutoModel.from_pretrained('microsoft/codebert-base')"
# Then run detector (cached)
python detector.py src/
```
### Issue: "High false-positive rate"
**Cause**: ML model marking legitimate code as vulnerable
**Solution**:
```bash
# Review reported vulnerabilities manually
# Update config to set higher threshold
python detector.py --severity-threshold HIGH src/
# Or exclude specific patterns
python detector.py --exclude-pattern "test_*" src/
```
## Example Scenarios
### Scenario 1: Widget Discovery Security Review
**Block**: 5 (QASpecialist)
**Task**: Verify widget discovery from Git/Hugging Face is secure
```bash
# Scan widget discovery service
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \
--security-only \
src/services/widget_discovery.py
# Expected findings:
# - Input validation for URLs
# - Git URL injection protection
# - Model download safety
# - Sandboxing of untrusted code
```
### Scenario 2: MCP Communication Validation
**Block**: 2 (CloudArch)
**Task**: Find vulnerabilities in MCP widget trigger mechanism
```bash
# Scan MCP integration code
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \
--security-only \
src/integrations/mcp_framework.py
# Expected findings:
# - Message injection protection
# - Parameter validation
# - Cross-widget access control
# - State synchronization safety
```
### Scenario 3: Widget Registry Data Safety
**Block**: 4 (DatabaseMaster)
**Task**: Verify widget registry prevents data corruption
```bash
# Scan database operations
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py \
--api-validation \
src/models/widget_registry.py
# Expected findings:
# - SQL injection prevention
# - Transaction integrity
# - Concurrent access safety
# - Backup mechanism validation
```
## Integration with Cascade
Hugging Face detector runs:
- On code changes for security-critical paths
- Results logged to: `.claude/logs/security-scan.log`
- CRITICAL findings block deployment
- HIGH findings trigger review
- Results included in daily standup
## Performance Optimization Tips
### Tip 1: Use Caching
```bash
# Results are cached by default
# Second run on same code is instant
# Clear cache if needed
rm -rf .cache/huggingface-detector/
```
### Tip 2: Parallel Scanning
```bash
# Install GNU Parallel
pip install parallel
# Scan 3 directories in parallel
parallel -j 3 "timeout 30 python detector.py {}" ::: src/agents src/services src/routers
```
### Tip 3: Incremental Scanning
```bash
# Only scan files changed since last commit
git diff --name-only | xargs -I {} timeout 30 python detector.py {}
# Or use git hooks for automatic scanning
# (setup in pre-commit hooks)
```
## Success Metrics
**Good HF detection results**:
- β
All CRITICAL issues identified
- β
Security vulnerabilities clearly reported
- β
Actionable fix recommendations
- β
Reasonable false-positive rate (<20%)
- β
Scans complete within 30 seconds
**Poor results** (need investigation):
- β Scan timeout every run
- β Memory errors or crashes
- β High false-positive rate (>50%)
- β Missing obvious vulnerabilities
- β Cannot download models
## Troubleshooting Checklist
- [ ] Have 30+ second timeout available?
- [ ] Scanning module, not entire project?
- [ ] Models cached? (first run is slow)
- [ ] Have GPU memory available? (or use CPU)
- [ ] Internet connection for first model download?
- [ ] Sufficient disk space for model cache?
## Next Steps
1. **Start small**: Scan one Python file first
2. **Review findings**: Read security warnings carefully
3. **Fix issues**: Address CRITICAL findings first
4. **Rescan**: Confirm fixes worked
5. **Automate**: Add to CI/CD pipeline
## Questions?
Escalation path:
1. Check this README
2. Try lower severity threshold to exclude false positives
3. File issue in daily standup with:
- Command you ran
- Error message (if any)
- Your block number
- Timeout issues or false positives encountered
---
**Ready to use?**
```bash
cd C:/Users/claus/Projects/WidgetTDC
# Start with one file
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/services/email_service.py
# Then expand to module
timeout 30 python tools/error-libraries/huggingface-error-detection/detector.py src/services/
# Review findings and fix
```
Security scanning enabled. π
|