--- title: AI Guardrails Chat Interface emoji: 🛡️ colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false license: mit suggested_hardware: cpu-basic suggested_storage: small --- # 🛡️ AI Guardrails Chat Interface A comprehensive AI safety system that provides real-time protection against prompt injection attacks and automatically anonymizes personally identifiable information (PII) in outputs. ## 🌟 Features ### 🔒 Input Protection - **AI-Powered Detection**: Uses a fine-tuned ModernBERT model (`zazaman/fmb`) to detect prompt injection attacks - **Multilingual Support**: Automatically translates non-English text to English using Qwen3-0.6B-GGUF before classification - **Real-time Analysis**: Sub-second security analysis of user inputs - **Attack Classification**: Identifies different types of prompt injection attempts ### 📄 Attachment Security - **Multi-format Support**: Analyzes text files (.txt, .md), PDFs, and Word documents - **Content Scanning**: Chunks large files and analyzes each section for malicious content - **Safety Verification**: Files must pass security checks before being processed ### 🔐 Output Protection - **PII Detection**: Automatically identifies and anonymizes personal information - **Smart Redaction**: Replaces sensitive data while preserving context - **Privacy-First**: Ensures no sensitive information leaks in responses ### 📊 Real-time Monitoring - **Live Dashboard**: Shows connection status, response times, and security metrics - **Detailed Analysis**: Expandable views show confidence scores, model decisions, and processing details - **Performance Tracking**: Monitors system performance and security effectiveness ## 🚀 How It Works 1. **Language Detection**: Non-English text is automatically detected 2. **Translation**: Non-English text is translated to English using Qwen3-0.6B-GGUF (if needed) 3. **Input Analysis**: Every message is scanned by the fine-tuned security model 4. **LLM Processing**: Safe messages are processed by Google Gemini 5. **Output Filtering**: Responses are analyzed and PII is automatically anonymized 6. **Detailed Reporting**: All steps are logged with performance metrics ## 🛠️ Technical Stack - **Frontend**: Modern web interface with real-time updates - **Security Model**: Fine-tuned ModernBERT (`zazaman/fmb`) for prompt injection detection - **Translation**: Qwen3-0.6B-GGUF (via llama-cpp-python) for multilingual text translation - **LLM**: Google Gemini 2.5 Flash for response generation - **Privacy**: Presidio for PII detection and anonymization - **File Processing**: PyMuPDF for PDFs, python-docx for Word documents ## 💡 Use Cases - **Customer Support**: Safe AI assistance with built-in security - **Content Moderation**: Automated detection of malicious prompts - **Privacy Compliance**: Automatic PII anonymization for data protection - **Research**: Understanding AI security threats and mitigation ## 🔧 Configuration The system supports various configuration options: - **LLM Provider**: Switch between Gemini, Ollama, LM Studio, or manual mode - **Security Thresholds**: Adjust confidence thresholds for detection - **Output Guardrails**: Enable/disable specific privacy protection features - **Performance Settings**: Optimize for CPU usage and memory consumption ## 🎯 Getting Started 1. The interface loads with a welcome message explaining the system 2. Type any message to see the guardrails in action 3. Upload files to test attachment security scanning 4. Click the dropdown arrows on responses to see detailed security analysis 5. Monitor the top-right dashboard for real-time system statistics ## 🔐 Security Features Demonstrated - **Prompt Injection Detection**: Try variations of "ignore previous instructions" - **PII Protection**: Include names, emails, or phone numbers in messages - **File Scanning**: Upload documents with varying content safety levels - **Real-time Monitoring**: Watch security metrics update with each interaction ## 📈 Performance Optimizations - **Shared Model Architecture**: Single model instance serves all components - **Memory Efficiency**: ~75% reduction in memory usage through model sharing - **CPU Optimization**: Tuned for efficient CPU-only inference - **Fast Startup**: 3-4x faster initialization through optimized loading - **Lazy Loading**: Translation model loads only when non-English text is detected - **GGUF Quantization**: Pre-quantized models (~250MB) for efficient CPU inference ## 🌍 Multilingual Support The system automatically handles multilingual inputs: - **Language Detection**: ASCII-based detection for non-English text - **Automatic Translation**: Uses Qwen3-0.6B-GGUF (IQ4_XS quantized, ~250MB) for translation - **Seamless Integration**: Translated text is automatically classified by ModernBERT - **No Performance Impact**: Translation model loads lazily only when needed ## 🚀 Deployment on Hugging Face Spaces This application is ready to deploy on Hugging Face Spaces: 1. **Create a Space**: Go to [Hugging Face Spaces](https://huggingface.co/spaces) and create a new Space 2. **Select SDK**: Choose "Docker" as the SDK 3. **Push Repository**: Push this repository to your Space 4. **Set Environment Variables** (in Space Settings → Repository secrets): - `GEMINI_API_KEY`: Your Google Gemini API key - `SECRET_KEY`: Flask secret key (optional, for production security) 5. **Hardware**: CPU Basic is sufficient (models load lazily) 6. **Storage**: Small storage is enough (models download on first use) The Dockerfile is configured for HF Spaces with all necessary dependencies including build tools for `llama-cpp-python`. --- **Note**: This demo uses a personal fine-tuned model for educational purposes. The system is designed to be modular and can integrate with various AI providers and security models.