File size: 3,943 Bytes
c59d808 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# Simplified Data Sanitization Documentation
## Overview
The simplified data sanitization module provides focused input validation and sanitization for the Recipe Recommendation Bot API. It's designed specifically for recipe chatbot context with essential security protection.
## Features
### π‘οΈ **Essential Security Protection**
- **XSS Prevention**: HTML encoding and basic script removal
- **Input Validation**: Length limits and content validation
- **Whitespace Normalization**: Clean formatting
### π§ **Simple Configuration**
- **Maximum Message Length**: 1000 characters
- **Minimum Message Length**: 1 character
- **Single Method**: One sanitization method for all inputs
## Usage
### Basic Sanitization
```python
from utils.sanitization import sanitize_user_input
# Sanitize any user input (chat messages, demo prompts)
clean_input = sanitize_user_input("What are some chicken recipes?")
```
### Advanced Usage
```python
from utils.sanitization import DataSanitizer
# Direct class usage
sanitizer = DataSanitizer()
clean_text = sanitizer.sanitize_input("User input")
```
## Security Patterns Handled
### Basic XSS Protection
- `<script>` tags β Removed
- `javascript:` URLs β Cleaned
- Event handlers (`onclick`, `onload`) β Removed
- HTML entities β Properly encoded
### Input Validation
- Length limits (1-1000 characters)
- Empty input detection
- Whitespace normalization
## Integration
The sanitization is automatically applied in FastAPI endpoints:
### Chat Endpoint
```python
class ChatMessage(BaseModel):
message: str = Field(..., min_length=1, max_length=1000)
@validator('message')
def sanitize_message_field(cls, v):
return sanitize_user_input(v)
```
### Demo Endpoint
```python
@app.get("/demo")
def demo(prompt: str = "What recipes do you have?"):
sanitized_prompt = sanitize_user_input(prompt)
# ... rest of the logic
```
## Error Handling
The sanitization raises `ValueError` for invalid input:
```python
try:
clean_input = sanitize_user_input(user_input)
except ValueError as e:
return {"error": f"Invalid input: {str(e)}"}
```
## Testing
Run the sanitization tests:
```bash
python3 test_sanitization.py
```
The test suite covers:
- Normal recipe-related messages
- Basic harmful content (scripts, JavaScript)
- Length validation
- Whitespace normalization
- Edge cases
## What's Simplified
### Removed Overly Complex Features:
- β SQL injection patterns (not relevant for LLM chatbot)
- β Command injection patterns (not applicable)
- β Separate strict/relaxed modes (unnecessary complexity)
- β Multiple sanitization methods (unified approach)
### Kept Essential Features:
- β
Basic XSS protection
- β
Input length validation
- β
HTML encoding
- β
Whitespace normalization
- β
Clear error messages
## Performance
- **Lightweight**: Minimal regex patterns
- **Fast**: Simple operations only
- **Memory Efficient**: No complex state
- **Recipe-Focused**: Context-appropriate validation
## Examples
### Valid Inputs (Cleaned):
```python
"What are chicken recipes?" β "What are chicken recipes?"
"<script>alert('xss')</script>Tell me about pasta" β "Tell me about pasta"
" How to cook rice? " β "How to cook rice?"
"What about desserts & sweets?" β "What about desserts & sweets?"
```
### Invalid Inputs (Rejected):
```python
"" β ValueError: Input cannot be empty
"a" * 1001 β ValueError: Input too long (maximum 1000 characters)
```
## Best Practices
1. **Keep It Simple**: Focus on actual threats for recipe chatbot
2. **Context Appropriate**: Don't over-engineer for non-existent threats
3. **User Friendly**: Allow normal recipe-related punctuation
4. **Clear Errors**: Provide helpful error messages
5. **Test Regularly**: Verify with real recipe queries
This simplified approach provides adequate protection while maintaining usability for a recipe recommendation chatbot context.
|