HickeyLabSocialMedia / IMPLEMENTATION_GUIDE.md
bobbyni819's picture
Upload 15 files
abb96d7 verified

A newer version of the Streamlit SDK is available: 1.53.0

Upgrade

Production Features Implementation Guide

This document explains what has been implemented for the Hickey Lab AI Assistant and how to configure and use each feature.


πŸ“¦ What Has Been Implemented

All the following features from the production roadmap have been implemented:

βœ… Phase 1: Foundation - Cost & Security Controls (High Priority πŸ”΄)

1. Cost Management Module (utils/cost_tracker.py)

Tracks API token usage and costs to prevent budget overruns.

What it does:

  • Extracts token counts from every Gemini API response
  • Calculates costs based on Gemini 2.5 Flash pricing ($0.075 per 1M input tokens, $0.30 per 1M output tokens)
  • Logs all usage to logs/usage.jsonl with timestamps
  • Tracks daily and monthly usage statistics
  • Enforces budget caps (blocks service when exceeded)
  • Generates usage reports

How to use it:

  1. Set budget limits in config.py:

    • DAILY_QUERY_LIMIT: Maximum queries per day (default: 200)
    • MONTHLY_BUDGET_USD: Monthly budget cap (default: $50)
    • DAILY_BUDGET_WARNING: Warning threshold (default: $5)
  2. View usage stats in the sidebar by checking "πŸ“Š Show Usage Stats"

  3. Generate reports manually:

    from utils.cost_tracker import CostTracker
    tracker = CostTracker()
    print(tracker.generate_daily_report())
    print(tracker.generate_monthly_report(2024, 12))
    

2. Rate Limiting System (utils/rate_limiter.py)

Prevents abuse through configurable rate limits.

What it does:

  • Tracks queries per session using sliding time windows
  • Enforces hourly limits (default: 20 queries per hour)
  • Enforces daily limits (default: 200 queries per 24 hours)
  • Shows warnings when approaching limits (at 80% by default)
  • Blocks queries when limits exceeded with friendly messages
  • Logs rate limit violations

How to use it:

  1. Configure limits in config.py:

    • RATE_LIMIT_PER_HOUR: Queries per hour (default: 20)
    • RATE_LIMIT_PER_DAY: Queries per day (default: 200)
    • RATE_LIMIT_WARNING_THRESHOLD: When to warn (default: 0.8 = 80%)
  2. Users will automatically see warnings like:

    • "⚠️ You have 4 questions remaining this hour"
    • "πŸ• Rate limit reached! Please wait 15 minutes..."

3. Security Module (utils/security.py)

Validates and sanitizes user input to prevent attacks.

What it does:

  • Checks input length (1-2000 characters by default)
  • Detects prompt injection attempts ("ignore previous instructions", etc.)
  • Blocks suspicious patterns (script tags, template injection, etc.)
  • Detects excessive special characters
  • Logs all security violations for review

How to use it:

  1. Configure limits in config.py:

    • MAX_INPUT_LENGTH: Maximum characters (default: 2000)
    • MIN_INPUT_LENGTH: Minimum characters (default: 1)
  2. Security is automatic - invalid inputs are rejected with user-friendly messages

  3. Review security logs in logs/security.jsonl to monitor threats

4. Alert System (utils/alerts.py)

Sends push notifications for critical events using ntfy.sh.

What it does:

  • Sends push notifications to your phone/browser via ntfy.sh (free, no signup)
  • Alerts for rate limit violations
  • Alerts for cost threshold breaches
  • Alerts for suspicious activity
  • Alerts for error spikes
  • Supports priority levels (min, low, default, high, urgent)

How to set it up:

  1. Subscribe to notifications:

    • Option A (Browser): Go to https://ntfy.sh/YOUR-TOPIC-NAME and click "Subscribe"
    • Option B (Mobile App):
      • Install ntfy app (iOS/Android)
      • Add subscription with your topic name
  2. Choose a SECURE topic name:

    • ⚠️ IMPORTANT: Use a random, hard-to-guess name for security!
    • βœ… Good: hickeylab-alerts-x9k2m7a4
    • ❌ Bad: hickeylab-alerts (anyone can subscribe)
  3. Configure the topic:

    • Set in config.py: NTFY_TOPIC = "your-topic-name"
    • Or set environment variable: NTFY_TOPIC=your-topic-name
  4. Test it:

    python -c "from utils.alerts import AlertSystem; AlertSystem().test_alert()"
    

    Or:

    curl -d "Test alert" ntfy.sh/your-topic-name
    

What you'll be notified about:

  • ⚠️ User hits rate limit
  • πŸ’° Daily/monthly cost thresholds (80%, 100%)
  • πŸ” Suspicious activity detected
  • 🚨 Service paused due to budget limits

βœ… Phase 2: Monitoring & Quality (Medium Priority 🟑)

5. Enhanced Logging

All queries are logged with metadata for analysis.

What's logged:

  • Timestamp
  • Session ID (truncated for privacy)
  • Question length
  • Token counts (prompt, response, total)
  • Estimated cost
  • Response time
  • Success/failure status
  • Error messages (if any)

Log files:

  • logs/usage.jsonl - All API usage
  • logs/rate_limits.jsonl - Rate limit violations
  • logs/security.jsonl - Security violations

6. Conversation Context

Maintains context across multiple messages for better responses.

What it does:

  • Includes last 5 exchanges in each query (configurable)
  • Allows follow-up questions to reference previous messages
  • Example:
    • User: "What is CODEX?"
    • Assistant: [explains CODEX]
    • User: "How does it compare to IBEX?"
    • Assistant: [compares CODEX (from context) to IBEX]

How to configure:

  • Adjust CONVERSATION_HISTORY_LENGTH in config.py (default: 5)

7. Enhanced System Prompt

Improved instructions for better response quality.

What's improved:

  • Conversation context awareness
  • Response structure guidelines (2-4 paragraphs for complex topics)
  • Specific citation instructions
  • Technical term explanation requirements
  • Grounding in knowledge base (no hallucinations)

βœ… Phase 3: User Experience (Low Priority 🟒)

8. Suggested Questions

Shows starter questions when chat is empty.

What it does:

  • Displays 4 suggested questions as clickable buttons
  • Questions are configured in config.py
  • Helps new users get started

How to customize:

  • Edit SUGGESTED_QUESTIONS in config.py

9. Privacy Notice

Displays privacy and usage information.

What it shows:

  • Data processing information
  • Usage limits
  • Privacy policy

How to customize:

  • Edit PRIVACY_NOTICE in config.py

10. Usage Statistics Dashboard

Shows real-time usage stats in sidebar.

What it shows:

  • Today's query count and cost
  • This month's query count and cost
  • Optional display (checkbox in sidebar)

11. Mobile Responsive Design

Improved CSS for mobile devices.

What's improved:

  • Touch-friendly button sizes (44px minimum)
  • Appropriate font sizes
  • No iOS zoom on input focus
  • Responsive layout

πŸš€ Deployment Instructions

For HuggingFace Spaces:

  1. Set up secrets:

    • Go to Space Settings β†’ Variables and secrets
    • Add GEMINI_API_KEY as a Secret
    • (Optional) Add NTFY_TOPIC for notifications
  2. Upload files:

    • Upload the entire outreach/pipelines/gemini_file_search/ directory
    • Ensure all files are included:
      • app.py
      • config.py
      • requirements.txt
      • utils/ directory with all modules
  3. The app will automatically:

    • Install dependencies from requirements.txt
    • Start the Streamlit app
    • Create logs/ directory when first query is made

Environment Variables:

Variable Required Description
GEMINI_API_KEY βœ… Yes Your Google Gemini API key
NTFY_TOPIC ❌ Optional Your ntfy.sh topic for push notifications

First-Time Setup:

  1. Test the app with a few queries
  2. Subscribe to notifications if you set up ntfy.sh
  3. Check logs in logs/ directory (if accessible)
  4. Adjust limits in config.py if needed

πŸ“Š Monitoring & Maintenance

Daily Tasks:

  • Check usage stats in the sidebar
  • Watch for notification alerts on your phone/browser

Weekly Tasks:

  • Review logs/usage.jsonl for usage patterns
  • Check logs/security.jsonl for any threats
  • Adjust rate limits if needed

Monthly Tasks:

  • Generate monthly cost report
  • Review budget and adjust if needed
  • Update system prompt based on user feedback

Generating Reports:

from utils.cost_tracker import CostTracker

tracker = CostTracker()

# Daily report
print(tracker.generate_daily_report())

# Monthly report
print(tracker.generate_monthly_report(2024, 12))

# Custom date
from datetime import datetime
print(tracker.generate_daily_report(datetime(2024, 12, 15)))

βš™οΈ Configuration Reference

All configuration is in config.py. Key settings:

Cost Management:

DAILY_QUERY_LIMIT = 200           # Max queries per day
MONTHLY_BUDGET_USD = 50.0         # Hard budget cap
DAILY_BUDGET_WARNING = 5.0        # Alert threshold

Rate Limiting:

RATE_LIMIT_PER_HOUR = 20          # Queries per hour
RATE_LIMIT_PER_DAY = 200          # Queries per 24 hours
RATE_LIMIT_WARNING_THRESHOLD = 0.8  # Warn at 80%

Security:

MAX_INPUT_LENGTH = 2000           # Max characters
MIN_INPUT_LENGTH = 1              # Min characters

Alerts:

NTFY_TOPIC = ""                   # Your ntfy.sh topic
ALERTS_ENABLED = True             # Enable/disable

Response Quality:

CONVERSATION_HISTORY_LENGTH = 5   # Messages of context
ENHANCED_SYSTEM_PROMPT = "..."   # Full prompt in file

UI/UX:

SUGGESTED_QUESTIONS = [...]       # Starter questions
PRIVACY_NOTICE = "..."           # Privacy text

πŸ”§ Troubleshooting

Logs not being created:

  • Check file permissions
  • Ensure logs/ directory is not in .gitignore for deployment
  • HuggingFace Spaces may not persist logs across restarts

Notifications not working:

  • Verify NTFY_TOPIC is set correctly
  • Test with: curl -d "test" ntfy.sh/your-topic
  • Check you're subscribed to the right topic
  • Ensure ALERTS_ENABLED = True in config

Rate limits too strict/lenient:

  • Adjust RATE_LIMIT_PER_HOUR and RATE_LIMIT_PER_DAY in config.py
  • Changes take effect on app restart

Budget exceeded too quickly:

  • Review logs/usage.jsonl for unusual activity
  • Check if there's an attack (many rapid queries)
  • Adjust MONTHLY_BUDGET_USD if legitimate traffic

Conversation context not working:

  • Verify CONVERSATION_HISTORY_LENGTH > 0
  • Check that messages are being stored in st.session_state.messages

πŸ“š Additional Resources


🎯 What You Need to Do

Required:

  1. βœ… Deploy the updated code to HuggingFace Spaces
  2. βœ… Set GEMINI_API_KEY secret in HuggingFace
  3. βœ… Test with a few queries to verify it works

Optional but Recommended:

  1. πŸ“± Set up ntfy.sh notifications:

    • Pick a random topic name
    • Subscribe on your phone/browser
    • Set NTFY_TOPIC in HuggingFace secrets
    • Test it works
  2. βš™οΈ Adjust configuration in config.py:

    • Set appropriate rate limits
    • Set monthly budget
    • Customize suggested questions
  3. πŸ“Š Monitor usage:

    • Check sidebar stats regularly
    • Watch for notification alerts
    • Review logs if accessible

πŸ“ž Support

If you encounter any issues:

  1. Check the troubleshooting section above
  2. Review the logs (if accessible)
  3. Check HuggingFace Spaces logs for errors
  4. Verify environment variables are set correctly

That's it! All the production-ready features from the roadmap have been implemented. The system is now protected against cost overruns, abuse, and security threats, with monitoring and alerting in place.