Spaces:
Sleeping
Sleeping
| # Production Features Implementation Guide | |
| This document explains what has been implemented for the Hickey Lab AI Assistant and how to configure and use each feature. | |
| --- | |
| ## π¦ What Has Been Implemented | |
| All the following features from the production roadmap have been implemented: | |
| ### β Phase 1: Foundation - Cost & Security Controls (High Priority π΄) | |
| #### 1. **Cost Management Module** (`utils/cost_tracker.py`) | |
| Tracks API token usage and costs to prevent budget overruns. | |
| **What it does:** | |
| - Extracts token counts from every Gemini API response | |
| - Calculates costs based on Gemini 2.5 Flash pricing ($0.075 per 1M input tokens, $0.30 per 1M output tokens) | |
| - Logs all usage to `logs/usage.jsonl` with timestamps | |
| - Tracks daily and monthly usage statistics | |
| - Enforces budget caps (blocks service when exceeded) | |
| - Generates usage reports | |
| **How to use it:** | |
| 1. Set budget limits in `config.py`: | |
| - `DAILY_QUERY_LIMIT`: Maximum queries per day (default: 200) | |
| - `MONTHLY_BUDGET_USD`: Monthly budget cap (default: $50) | |
| - `DAILY_BUDGET_WARNING`: Warning threshold (default: $5) | |
| 2. View usage stats in the sidebar by checking "π Show Usage Stats" | |
| 3. Generate reports manually: | |
| ```python | |
| from utils.cost_tracker import CostTracker | |
| tracker = CostTracker() | |
| print(tracker.generate_daily_report()) | |
| print(tracker.generate_monthly_report(2024, 12)) | |
| ``` | |
| #### 2. **Rate Limiting System** (`utils/rate_limiter.py`) | |
| Prevents abuse through configurable rate limits. | |
| **What it does:** | |
| - Tracks queries per session using sliding time windows | |
| - Enforces hourly limits (default: 20 queries per hour) | |
| - Enforces daily limits (default: 200 queries per 24 hours) | |
| - Shows warnings when approaching limits (at 80% by default) | |
| - Blocks queries when limits exceeded with friendly messages | |
| - Logs rate limit violations | |
| **How to use it:** | |
| 1. Configure limits in `config.py`: | |
| - `RATE_LIMIT_PER_HOUR`: Queries per hour (default: 20) | |
| - `RATE_LIMIT_PER_DAY`: Queries per day (default: 200) | |
| - `RATE_LIMIT_WARNING_THRESHOLD`: When to warn (default: 0.8 = 80%) | |
| 2. Users will automatically see warnings like: | |
| - "β οΈ You have 4 questions remaining this hour" | |
| - "π Rate limit reached! Please wait 15 minutes..." | |
| #### 3. **Security Module** (`utils/security.py`) | |
| Validates and sanitizes user input to prevent attacks. | |
| **What it does:** | |
| - Checks input length (1-2000 characters by default) | |
| - Detects prompt injection attempts ("ignore previous instructions", etc.) | |
| - Blocks suspicious patterns (script tags, template injection, etc.) | |
| - Detects excessive special characters | |
| - Logs all security violations for review | |
| **How to use it:** | |
| 1. Configure limits in `config.py`: | |
| - `MAX_INPUT_LENGTH`: Maximum characters (default: 2000) | |
| - `MIN_INPUT_LENGTH`: Minimum characters (default: 1) | |
| 2. Security is automatic - invalid inputs are rejected with user-friendly messages | |
| 3. Review security logs in `logs/security.jsonl` to monitor threats | |
| #### 4. **Alert System** (`utils/alerts.py`) | |
| Sends push notifications for critical events using ntfy.sh. | |
| **What it does:** | |
| - Sends push notifications to your phone/browser via ntfy.sh (free, no signup) | |
| - Alerts for rate limit violations | |
| - Alerts for cost threshold breaches | |
| - Alerts for suspicious activity | |
| - Alerts for error spikes | |
| - Supports priority levels (min, low, default, high, urgent) | |
| **How to set it up:** | |
| 1. **Subscribe to notifications:** | |
| - Option A (Browser): Go to `https://ntfy.sh/YOUR-TOPIC-NAME` and click "Subscribe" | |
| - Option B (Mobile App): | |
| - Install ntfy app (iOS/Android) | |
| - Add subscription with your topic name | |
| 2. **Choose a SECURE topic name:** | |
| - β οΈ IMPORTANT: Use a random, hard-to-guess name for security! | |
| - β Good: `hickeylab-alerts-x9k2m7a4` | |
| - β Bad: `hickeylab-alerts` (anyone can subscribe) | |
| 3. **Configure the topic:** | |
| - Set in `config.py`: `NTFY_TOPIC = "your-topic-name"` | |
| - Or set environment variable: `NTFY_TOPIC=your-topic-name` | |
| 4. **Test it:** | |
| ```bash | |
| python -c "from utils.alerts import AlertSystem; AlertSystem().test_alert()" | |
| ``` | |
| Or: | |
| ```bash | |
| curl -d "Test alert" ntfy.sh/your-topic-name | |
| ``` | |
| **What you'll be notified about:** | |
| - β οΈ User hits rate limit | |
| - π° Daily/monthly cost thresholds (80%, 100%) | |
| - π Suspicious activity detected | |
| - π¨ Service paused due to budget limits | |
| --- | |
| ### β Phase 2: Monitoring & Quality (Medium Priority π‘) | |
| #### 5. **Enhanced Logging** | |
| All queries are logged with metadata for analysis. | |
| **What's logged:** | |
| - Timestamp | |
| - Session ID (truncated for privacy) | |
| - Question length | |
| - Token counts (prompt, response, total) | |
| - Estimated cost | |
| - Response time | |
| - Success/failure status | |
| - Error messages (if any) | |
| **Log files:** | |
| - `logs/usage.jsonl` - All API usage | |
| - `logs/rate_limits.jsonl` - Rate limit violations | |
| - `logs/security.jsonl` - Security violations | |
| #### 6. **Conversation Context** | |
| Maintains context across multiple messages for better responses. | |
| **What it does:** | |
| - Includes last 5 exchanges in each query (configurable) | |
| - Allows follow-up questions to reference previous messages | |
| - Example: | |
| - User: "What is CODEX?" | |
| - Assistant: [explains CODEX] | |
| - User: "How does it compare to IBEX?" | |
| - Assistant: [compares CODEX (from context) to IBEX] | |
| **How to configure:** | |
| - Adjust `CONVERSATION_HISTORY_LENGTH` in `config.py` (default: 5) | |
| #### 7. **Enhanced System Prompt** | |
| Improved instructions for better response quality. | |
| **What's improved:** | |
| - Conversation context awareness | |
| - Response structure guidelines (2-4 paragraphs for complex topics) | |
| - Specific citation instructions | |
| - Technical term explanation requirements | |
| - Grounding in knowledge base (no hallucinations) | |
| --- | |
| ### β Phase 3: User Experience (Low Priority π’) | |
| #### 8. **Suggested Questions** | |
| Shows starter questions when chat is empty. | |
| **What it does:** | |
| - Displays 4 suggested questions as clickable buttons | |
| - Questions are configured in `config.py` | |
| - Helps new users get started | |
| **How to customize:** | |
| - Edit `SUGGESTED_QUESTIONS` in `config.py` | |
| #### 9. **Privacy Notice** | |
| Displays privacy and usage information. | |
| **What it shows:** | |
| - Data processing information | |
| - Usage limits | |
| - Privacy policy | |
| **How to customize:** | |
| - Edit `PRIVACY_NOTICE` in `config.py` | |
| #### 10. **Usage Statistics Dashboard** | |
| Shows real-time usage stats in sidebar. | |
| **What it shows:** | |
| - Today's query count and cost | |
| - This month's query count and cost | |
| - Optional display (checkbox in sidebar) | |
| #### 11. **Mobile Responsive Design** | |
| Improved CSS for mobile devices. | |
| **What's improved:** | |
| - Touch-friendly button sizes (44px minimum) | |
| - Appropriate font sizes | |
| - No iOS zoom on input focus | |
| - Responsive layout | |
| --- | |
| ## π Deployment Instructions | |
| ### For HuggingFace Spaces: | |
| 1. **Set up secrets:** | |
| - Go to Space Settings β Variables and secrets | |
| - Add `GEMINI_API_KEY` as a Secret | |
| - (Optional) Add `NTFY_TOPIC` for notifications | |
| 2. **Upload files:** | |
| - Upload the entire `outreach/pipelines/gemini_file_search/` directory | |
| - Ensure all files are included: | |
| - `app.py` | |
| - `config.py` | |
| - `requirements.txt` | |
| - `utils/` directory with all modules | |
| 3. **The app will automatically:** | |
| - Install dependencies from `requirements.txt` | |
| - Start the Streamlit app | |
| - Create `logs/` directory when first query is made | |
| ### Environment Variables: | |
| | Variable | Required | Description | | |
| |----------|----------|-------------| | |
| | `GEMINI_API_KEY` | β Yes | Your Google Gemini API key | | |
| | `NTFY_TOPIC` | β Optional | Your ntfy.sh topic for push notifications | | |
| ### First-Time Setup: | |
| 1. **Test the app** with a few queries | |
| 2. **Subscribe to notifications** if you set up ntfy.sh | |
| 3. **Check logs** in `logs/` directory (if accessible) | |
| 4. **Adjust limits** in `config.py` if needed | |
| --- | |
| ## π Monitoring & Maintenance | |
| ### Daily Tasks: | |
| - Check usage stats in the sidebar | |
| - Watch for notification alerts on your phone/browser | |
| ### Weekly Tasks: | |
| - Review `logs/usage.jsonl` for usage patterns | |
| - Check `logs/security.jsonl` for any threats | |
| - Adjust rate limits if needed | |
| ### Monthly Tasks: | |
| - Generate monthly cost report | |
| - Review budget and adjust if needed | |
| - Update system prompt based on user feedback | |
| ### Generating Reports: | |
| ```python | |
| from utils.cost_tracker import CostTracker | |
| tracker = CostTracker() | |
| # Daily report | |
| print(tracker.generate_daily_report()) | |
| # Monthly report | |
| print(tracker.generate_monthly_report(2024, 12)) | |
| # Custom date | |
| from datetime import datetime | |
| print(tracker.generate_daily_report(datetime(2024, 12, 15))) | |
| ``` | |
| --- | |
| ## βοΈ Configuration Reference | |
| All configuration is in `config.py`. Key settings: | |
| ### Cost Management: | |
| ```python | |
| DAILY_QUERY_LIMIT = 200 # Max queries per day | |
| MONTHLY_BUDGET_USD = 50.0 # Hard budget cap | |
| DAILY_BUDGET_WARNING = 5.0 # Alert threshold | |
| ``` | |
| ### Rate Limiting: | |
| ```python | |
| RATE_LIMIT_PER_HOUR = 20 # Queries per hour | |
| RATE_LIMIT_PER_DAY = 200 # Queries per 24 hours | |
| RATE_LIMIT_WARNING_THRESHOLD = 0.8 # Warn at 80% | |
| ``` | |
| ### Security: | |
| ```python | |
| MAX_INPUT_LENGTH = 2000 # Max characters | |
| MIN_INPUT_LENGTH = 1 # Min characters | |
| ``` | |
| ### Alerts: | |
| ```python | |
| NTFY_TOPIC = "" # Your ntfy.sh topic | |
| ALERTS_ENABLED = True # Enable/disable | |
| ``` | |
| ### Response Quality: | |
| ```python | |
| CONVERSATION_HISTORY_LENGTH = 5 # Messages of context | |
| ENHANCED_SYSTEM_PROMPT = "..." # Full prompt in file | |
| ``` | |
| ### UI/UX: | |
| ```python | |
| SUGGESTED_QUESTIONS = [...] # Starter questions | |
| PRIVACY_NOTICE = "..." # Privacy text | |
| ``` | |
| --- | |
| ## π§ Troubleshooting | |
| ### Logs not being created: | |
| - Check file permissions | |
| - Ensure `logs/` directory is not in `.gitignore` for deployment | |
| - HuggingFace Spaces may not persist logs across restarts | |
| ### Notifications not working: | |
| - Verify `NTFY_TOPIC` is set correctly | |
| - Test with: `curl -d "test" ntfy.sh/your-topic` | |
| - Check you're subscribed to the right topic | |
| - Ensure `ALERTS_ENABLED = True` in config | |
| ### Rate limits too strict/lenient: | |
| - Adjust `RATE_LIMIT_PER_HOUR` and `RATE_LIMIT_PER_DAY` in `config.py` | |
| - Changes take effect on app restart | |
| ### Budget exceeded too quickly: | |
| - Review `logs/usage.jsonl` for unusual activity | |
| - Check if there's an attack (many rapid queries) | |
| - Adjust `MONTHLY_BUDGET_USD` if legitimate traffic | |
| ### Conversation context not working: | |
| - Verify `CONVERSATION_HISTORY_LENGTH > 0` | |
| - Check that messages are being stored in `st.session_state.messages` | |
| --- | |
| ## π Additional Resources | |
| - **Gemini API Pricing**: https://ai.google.dev/pricing | |
| - **ntfy.sh Documentation**: https://ntfy.sh | |
| - **HuggingFace Spaces**: https://huggingface.co/docs/hub/spaces | |
| - **Streamlit Documentation**: https://docs.streamlit.io | |
| --- | |
| ## π― What You Need to Do | |
| ### Required: | |
| 1. β Deploy the updated code to HuggingFace Spaces | |
| 2. β Set `GEMINI_API_KEY` secret in HuggingFace | |
| 3. β Test with a few queries to verify it works | |
| ### Optional but Recommended: | |
| 1. π± Set up ntfy.sh notifications: | |
| - Pick a random topic name | |
| - Subscribe on your phone/browser | |
| - Set `NTFY_TOPIC` in HuggingFace secrets | |
| - Test it works | |
| 2. βοΈ Adjust configuration in `config.py`: | |
| - Set appropriate rate limits | |
| - Set monthly budget | |
| - Customize suggested questions | |
| 3. π Monitor usage: | |
| - Check sidebar stats regularly | |
| - Watch for notification alerts | |
| - Review logs if accessible | |
| --- | |
| ## π Support | |
| If you encounter any issues: | |
| 1. Check the troubleshooting section above | |
| 2. Review the logs (if accessible) | |
| 3. Check HuggingFace Spaces logs for errors | |
| 4. Verify environment variables are set correctly | |
| --- | |
| **That's it!** All the production-ready features from the roadmap have been implemented. The system is now protected against cost overruns, abuse, and security threats, with monitoring and alerting in place. | |