Simplified Data Sanitization Documentation
Overview
The simplified data sanitization module provides focused input validation and sanitization for the Recipe Recommendation Bot API. It's designed specifically for recipe chatbot context with essential security protection.
Features
π‘οΈ Essential Security Protection
- XSS Prevention: HTML encoding and basic script removal
- Input Validation: Length limits and content validation
- Whitespace Normalization: Clean formatting
π§ Simple Configuration
- Maximum Message Length: 1000 characters
- Minimum Message Length: 1 character
- Single Method: One sanitization method for all inputs
Usage
Basic Sanitization
from utils.sanitization import sanitize_user_input
# Sanitize any user input (chat messages, demo prompts)
clean_input = sanitize_user_input("What are some chicken recipes?")
Advanced Usage
from utils.sanitization import DataSanitizer
# Direct class usage
sanitizer = DataSanitizer()
clean_text = sanitizer.sanitize_input("User input")
Security Patterns Handled
Basic XSS Protection
<script>tags β Removedjavascript:URLs β Cleaned- Event handlers (
onclick,onload) β Removed - HTML entities β Properly encoded
Input Validation
- Length limits (1-1000 characters)
- Empty input detection
- Whitespace normalization
Integration
The sanitization is automatically applied in FastAPI endpoints:
Chat Endpoint
class ChatMessage(BaseModel):
message: str = Field(..., min_length=1, max_length=1000)
@validator('message')
def sanitize_message_field(cls, v):
return sanitize_user_input(v)
Demo Endpoint
@app.get("/demo")
def demo(prompt: str = "What recipes do you have?"):
sanitized_prompt = sanitize_user_input(prompt)
# ... rest of the logic
Error Handling
The sanitization raises ValueError for invalid input:
try:
clean_input = sanitize_user_input(user_input)
except ValueError as e:
return {"error": f"Invalid input: {str(e)}"}
Testing
Run the sanitization tests:
python3 test_sanitization.py
The test suite covers:
- Normal recipe-related messages
- Basic harmful content (scripts, JavaScript)
- Length validation
- Whitespace normalization
- Edge cases
What's Simplified
Removed Overly Complex Features:
- β SQL injection patterns (not relevant for LLM chatbot)
- β Command injection patterns (not applicable)
- β Separate strict/relaxed modes (unnecessary complexity)
- β Multiple sanitization methods (unified approach)
Kept Essential Features:
- β Basic XSS protection
- β Input length validation
- β HTML encoding
- β Whitespace normalization
- β Clear error messages
Performance
- Lightweight: Minimal regex patterns
- Fast: Simple operations only
- Memory Efficient: No complex state
- Recipe-Focused: Context-appropriate validation
Examples
Valid Inputs (Cleaned):
"What are chicken recipes?" β "What are chicken recipes?"
"<script>alert('xss')</script>Tell me about pasta" β "Tell me about pasta"
" How to cook rice? " β "How to cook rice?"
"What about desserts & sweets?" β "What about desserts & sweets?"
Invalid Inputs (Rejected):
"" β ValueError: Input cannot be empty
"a" * 1001 β ValueError: Input too long (maximum 1000 characters)
Best Practices
- Keep It Simple: Focus on actual threats for recipe chatbot
- Context Appropriate: Don't over-engineer for non-existent threats
- User Friendly: Allow normal recipe-related punctuation
- Clear Errors: Provide helpful error messages
- Test Regularly: Verify with real recipe queries
This simplified approach provides adequate protection while maintaining usability for a recipe recommendation chatbot context.