Spaces:
Sleeping
Sleeping
metadata
title: AI Guardrails Chat Interface
emoji: π‘οΈ
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: cpu-basic
suggested_storage: small
π‘οΈ AI Guardrails Chat Interface
A comprehensive AI safety system that provides real-time protection against prompt injection attacks and automatically anonymizes personally identifiable information (PII) in outputs.
π Features
π Input Protection
- AI-Powered Detection: Uses a fine-tuned ModernBERT model (
zazaman/fmb) to detect prompt injection attacks - Multilingual Support: Automatically translates non-English text to English using Qwen3-0.6B-GGUF before classification
- Real-time Analysis: Sub-second security analysis of user inputs
- Attack Classification: Identifies different types of prompt injection attempts
π Attachment Security
- Multi-format Support: Analyzes text files (.txt, .md), PDFs, and Word documents
- Content Scanning: Chunks large files and analyzes each section for malicious content
- Safety Verification: Files must pass security checks before being processed
π Output Protection
- PII Detection: Automatically identifies and anonymizes personal information
- Smart Redaction: Replaces sensitive data while preserving context
- Privacy-First: Ensures no sensitive information leaks in responses
π Real-time Monitoring
- Live Dashboard: Shows connection status, response times, and security metrics
- Detailed Analysis: Expandable views show confidence scores, model decisions, and processing details
- Performance Tracking: Monitors system performance and security effectiveness
π How It Works
- Language Detection: Non-English text is automatically detected
- Translation: Non-English text is translated to English using Qwen3-0.6B-GGUF (if needed)
- Input Analysis: Every message is scanned by the fine-tuned security model
- LLM Processing: Safe messages are processed by Google Gemini
- Output Filtering: Responses are analyzed and PII is automatically anonymized
- Detailed Reporting: All steps are logged with performance metrics
π οΈ Technical Stack
- Frontend: Modern web interface with real-time updates
- Security Model: Fine-tuned ModernBERT (
zazaman/fmb) for prompt injection detection - Translation: Qwen3-0.6B-GGUF (via llama-cpp-python) for multilingual text translation
- LLM: Google Gemini 2.5 Flash for response generation
- Privacy: Presidio for PII detection and anonymization
- File Processing: PyMuPDF for PDFs, python-docx for Word documents
π‘ Use Cases
- Customer Support: Safe AI assistance with built-in security
- Content Moderation: Automated detection of malicious prompts
- Privacy Compliance: Automatic PII anonymization for data protection
- Research: Understanding AI security threats and mitigation
π§ Configuration
The system supports various configuration options:
- LLM Provider: Switch between Gemini, Ollama, LM Studio, or manual mode
- Security Thresholds: Adjust confidence thresholds for detection
- Output Guardrails: Enable/disable specific privacy protection features
- Performance Settings: Optimize for CPU usage and memory consumption
π― Getting Started
- The interface loads with a welcome message explaining the system
- Type any message to see the guardrails in action
- Upload files to test attachment security scanning
- Click the dropdown arrows on responses to see detailed security analysis
- Monitor the top-right dashboard for real-time system statistics
π Security Features Demonstrated
- Prompt Injection Detection: Try variations of "ignore previous instructions"
- PII Protection: Include names, emails, or phone numbers in messages
- File Scanning: Upload documents with varying content safety levels
- Real-time Monitoring: Watch security metrics update with each interaction
π Performance Optimizations
- Shared Model Architecture: Single model instance serves all components
- Memory Efficiency: ~75% reduction in memory usage through model sharing
- CPU Optimization: Tuned for efficient CPU-only inference
- Fast Startup: 3-4x faster initialization through optimized loading
- Lazy Loading: Translation model loads only when non-English text is detected
- GGUF Quantization: Pre-quantized models (~250MB) for efficient CPU inference
π Multilingual Support
The system automatically handles multilingual inputs:
- Language Detection: ASCII-based detection for non-English text
- Automatic Translation: Uses Qwen3-0.6B-GGUF (IQ4_XS quantized, ~250MB) for translation
- Seamless Integration: Translated text is automatically classified by ModernBERT
- No Performance Impact: Translation model loads lazily only when needed
π Deployment on Hugging Face Spaces
This application is ready to deploy on Hugging Face Spaces:
- Create a Space: Go to Hugging Face Spaces and create a new Space
- Select SDK: Choose "Docker" as the SDK
- Push Repository: Push this repository to your Space
- Set Environment Variables (in Space Settings β Repository secrets):
GEMINI_API_KEY: Your Google Gemini API keySECRET_KEY: Flask secret key (optional, for production security)
- Hardware: CPU Basic is sufficient (models load lazily)
- Storage: Small storage is enough (models download on first use)
The Dockerfile is configured for HF Spaces with all necessary dependencies including build tools for llama-cpp-python.
Note: This demo uses a personal fine-tuned model for educational purposes. The system is designed to be modular and can integrate with various AI providers and security models.