Spaces:

zazaman
/

guardrails-final

Sleeping

App Files Files Community

guardrails-final / README.md

zazaman

Add multilingual translation support with Qwen3-0.6B-GGUF and optimize for Hugging Face Spaces deployment

a2e1879 about 2 months ago

preview code

raw

history blame

5.89 kB

metadata

title: AI Guardrails Chat Interface
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: cpu-basic
suggested_storage: small

🛡️ AI Guardrails Chat Interface

A comprehensive AI safety system that provides real-time protection against prompt injection attacks and automatically anonymizes personally identifiable information (PII) in outputs.

🌟 Features

🔒 Input Protection

AI-Powered Detection: Uses a fine-tuned ModernBERT model (zazaman/fmb) to detect prompt injection attacks
Multilingual Support: Automatically translates non-English text to English using Qwen3-0.6B-GGUF before classification
Real-time Analysis: Sub-second security analysis of user inputs
Attack Classification: Identifies different types of prompt injection attempts

📄 Attachment Security

Multi-format Support: Analyzes text files (.txt, .md), PDFs, and Word documents
Content Scanning: Chunks large files and analyzes each section for malicious content
Safety Verification: Files must pass security checks before being processed

🔐 Output Protection

PII Detection: Automatically identifies and anonymizes personal information
Smart Redaction: Replaces sensitive data while preserving context
Privacy-First: Ensures no sensitive information leaks in responses

📊 Real-time Monitoring

Live Dashboard: Shows connection status, response times, and security metrics
Detailed Analysis: Expandable views show confidence scores, model decisions, and processing details
Performance Tracking: Monitors system performance and security effectiveness

🚀 How It Works

Language Detection: Non-English text is automatically detected
Translation: Non-English text is translated to English using Qwen3-0.6B-GGUF (if needed)
Input Analysis: Every message is scanned by the fine-tuned security model
LLM Processing: Safe messages are processed by Google Gemini
Output Filtering: Responses are analyzed and PII is automatically anonymized
Detailed Reporting: All steps are logged with performance metrics

🛠️ Technical Stack

Frontend: Modern web interface with real-time updates
Security Model: Fine-tuned ModernBERT (zazaman/fmb) for prompt injection detection
Translation: Qwen3-0.6B-GGUF (via llama-cpp-python) for multilingual text translation
LLM: Google Gemini 2.5 Flash for response generation
Privacy: Presidio for PII detection and anonymization
File Processing: PyMuPDF for PDFs, python-docx for Word documents

💡 Use Cases

Customer Support: Safe AI assistance with built-in security
Content Moderation: Automated detection of malicious prompts
Privacy Compliance: Automatic PII anonymization for data protection
Research: Understanding AI security threats and mitigation

🔧 Configuration

The system supports various configuration options:

LLM Provider: Switch between Gemini, Ollama, LM Studio, or manual mode
Security Thresholds: Adjust confidence thresholds for detection
Output Guardrails: Enable/disable specific privacy protection features
Performance Settings: Optimize for CPU usage and memory consumption

🎯 Getting Started

The interface loads with a welcome message explaining the system
Type any message to see the guardrails in action
Upload files to test attachment security scanning
Click the dropdown arrows on responses to see detailed security analysis
Monitor the top-right dashboard for real-time system statistics

🔐 Security Features Demonstrated

Prompt Injection Detection: Try variations of "ignore previous instructions"
PII Protection: Include names, emails, or phone numbers in messages
File Scanning: Upload documents with varying content safety levels
Real-time Monitoring: Watch security metrics update with each interaction

📈 Performance Optimizations

Shared Model Architecture: Single model instance serves all components
Memory Efficiency: ~75% reduction in memory usage through model sharing
CPU Optimization: Tuned for efficient CPU-only inference
Fast Startup: 3-4x faster initialization through optimized loading
Lazy Loading: Translation model loads only when non-English text is detected
GGUF Quantization: Pre-quantized models (~250MB) for efficient CPU inference

🌍 Multilingual Support

The system automatically handles multilingual inputs:

Language Detection: ASCII-based detection for non-English text
Automatic Translation: Uses Qwen3-0.6B-GGUF (IQ4_XS quantized, ~250MB) for translation
Seamless Integration: Translated text is automatically classified by ModernBERT
No Performance Impact: Translation model loads lazily only when needed

🚀 Deployment on Hugging Face Spaces

This application is ready to deploy on Hugging Face Spaces:

Create a Space: Go to Hugging Face Spaces and create a new Space
Select SDK: Choose "Docker" as the SDK
Push Repository: Push this repository to your Space
Set Environment Variables (in Space Settings → Repository secrets):
- GEMINI_API_KEY: Your Google Gemini API key
- SECRET_KEY: Flask secret key (optional, for production security)
Hardware: CPU Basic is sufficient (models load lazily)
Storage: Small storage is enough (models download on first use)

The Dockerfile is configured for HF Spaces with all necessary dependencies including build tools for llama-cpp-python.

Note: This demo uses a personal fine-tuned model for educational purposes. The system is designed to be modular and can integrate with various AI providers and security models.