Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.53.0
Production Features Implementation Guide
This document explains what has been implemented for the Hickey Lab AI Assistant and how to configure and use each feature.
π¦ What Has Been Implemented
All the following features from the production roadmap have been implemented:
β Phase 1: Foundation - Cost & Security Controls (High Priority π΄)
1. Cost Management Module (utils/cost_tracker.py)
Tracks API token usage and costs to prevent budget overruns.
What it does:
- Extracts token counts from every Gemini API response
- Calculates costs based on Gemini 2.5 Flash pricing ($0.075 per 1M input tokens, $0.30 per 1M output tokens)
- Logs all usage to
logs/usage.jsonlwith timestamps - Tracks daily and monthly usage statistics
- Enforces budget caps (blocks service when exceeded)
- Generates usage reports
How to use it:
Set budget limits in
config.py:DAILY_QUERY_LIMIT: Maximum queries per day (default: 200)MONTHLY_BUDGET_USD: Monthly budget cap (default: $50)DAILY_BUDGET_WARNING: Warning threshold (default: $5)
View usage stats in the sidebar by checking "π Show Usage Stats"
Generate reports manually:
from utils.cost_tracker import CostTracker tracker = CostTracker() print(tracker.generate_daily_report()) print(tracker.generate_monthly_report(2024, 12))
2. Rate Limiting System (utils/rate_limiter.py)
Prevents abuse through configurable rate limits.
What it does:
- Tracks queries per session using sliding time windows
- Enforces hourly limits (default: 20 queries per hour)
- Enforces daily limits (default: 200 queries per 24 hours)
- Shows warnings when approaching limits (at 80% by default)
- Blocks queries when limits exceeded with friendly messages
- Logs rate limit violations
How to use it:
Configure limits in
config.py:RATE_LIMIT_PER_HOUR: Queries per hour (default: 20)RATE_LIMIT_PER_DAY: Queries per day (default: 200)RATE_LIMIT_WARNING_THRESHOLD: When to warn (default: 0.8 = 80%)
Users will automatically see warnings like:
- "β οΈ You have 4 questions remaining this hour"
- "π Rate limit reached! Please wait 15 minutes..."
3. Security Module (utils/security.py)
Validates and sanitizes user input to prevent attacks.
What it does:
- Checks input length (1-2000 characters by default)
- Detects prompt injection attempts ("ignore previous instructions", etc.)
- Blocks suspicious patterns (script tags, template injection, etc.)
- Detects excessive special characters
- Logs all security violations for review
How to use it:
Configure limits in
config.py:MAX_INPUT_LENGTH: Maximum characters (default: 2000)MIN_INPUT_LENGTH: Minimum characters (default: 1)
Security is automatic - invalid inputs are rejected with user-friendly messages
Review security logs in
logs/security.jsonlto monitor threats
4. Alert System (utils/alerts.py)
Sends push notifications for critical events using ntfy.sh.
What it does:
- Sends push notifications to your phone/browser via ntfy.sh (free, no signup)
- Alerts for rate limit violations
- Alerts for cost threshold breaches
- Alerts for suspicious activity
- Alerts for error spikes
- Supports priority levels (min, low, default, high, urgent)
How to set it up:
Subscribe to notifications:
- Option A (Browser): Go to
https://ntfy.sh/YOUR-TOPIC-NAMEand click "Subscribe" - Option B (Mobile App):
- Install ntfy app (iOS/Android)
- Add subscription with your topic name
- Option A (Browser): Go to
Choose a SECURE topic name:
- β οΈ IMPORTANT: Use a random, hard-to-guess name for security!
- β
Good:
hickeylab-alerts-x9k2m7a4 - β Bad:
hickeylab-alerts(anyone can subscribe)
Configure the topic:
- Set in
config.py:NTFY_TOPIC = "your-topic-name" - Or set environment variable:
NTFY_TOPIC=your-topic-name
- Set in
Test it:
python -c "from utils.alerts import AlertSystem; AlertSystem().test_alert()"Or:
curl -d "Test alert" ntfy.sh/your-topic-name
What you'll be notified about:
- β οΈ User hits rate limit
- π° Daily/monthly cost thresholds (80%, 100%)
- π Suspicious activity detected
- π¨ Service paused due to budget limits
β Phase 2: Monitoring & Quality (Medium Priority π‘)
5. Enhanced Logging
All queries are logged with metadata for analysis.
What's logged:
- Timestamp
- Session ID (truncated for privacy)
- Question length
- Token counts (prompt, response, total)
- Estimated cost
- Response time
- Success/failure status
- Error messages (if any)
Log files:
logs/usage.jsonl- All API usagelogs/rate_limits.jsonl- Rate limit violationslogs/security.jsonl- Security violations
6. Conversation Context
Maintains context across multiple messages for better responses.
What it does:
- Includes last 5 exchanges in each query (configurable)
- Allows follow-up questions to reference previous messages
- Example:
- User: "What is CODEX?"
- Assistant: [explains CODEX]
- User: "How does it compare to IBEX?"
- Assistant: [compares CODEX (from context) to IBEX]
How to configure:
- Adjust
CONVERSATION_HISTORY_LENGTHinconfig.py(default: 5)
7. Enhanced System Prompt
Improved instructions for better response quality.
What's improved:
- Conversation context awareness
- Response structure guidelines (2-4 paragraphs for complex topics)
- Specific citation instructions
- Technical term explanation requirements
- Grounding in knowledge base (no hallucinations)
β Phase 3: User Experience (Low Priority π’)
8. Suggested Questions
Shows starter questions when chat is empty.
What it does:
- Displays 4 suggested questions as clickable buttons
- Questions are configured in
config.py - Helps new users get started
How to customize:
- Edit
SUGGESTED_QUESTIONSinconfig.py
9. Privacy Notice
Displays privacy and usage information.
What it shows:
- Data processing information
- Usage limits
- Privacy policy
How to customize:
- Edit
PRIVACY_NOTICEinconfig.py
10. Usage Statistics Dashboard
Shows real-time usage stats in sidebar.
What it shows:
- Today's query count and cost
- This month's query count and cost
- Optional display (checkbox in sidebar)
11. Mobile Responsive Design
Improved CSS for mobile devices.
What's improved:
- Touch-friendly button sizes (44px minimum)
- Appropriate font sizes
- No iOS zoom on input focus
- Responsive layout
π Deployment Instructions
For HuggingFace Spaces:
Set up secrets:
- Go to Space Settings β Variables and secrets
- Add
GEMINI_API_KEYas a Secret - (Optional) Add
NTFY_TOPICfor notifications
Upload files:
- Upload the entire
outreach/pipelines/gemini_file_search/directory - Ensure all files are included:
app.pyconfig.pyrequirements.txtutils/directory with all modules
- Upload the entire
The app will automatically:
- Install dependencies from
requirements.txt - Start the Streamlit app
- Create
logs/directory when first query is made
- Install dependencies from
Environment Variables:
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
β Yes | Your Google Gemini API key |
NTFY_TOPIC |
β Optional | Your ntfy.sh topic for push notifications |
First-Time Setup:
- Test the app with a few queries
- Subscribe to notifications if you set up ntfy.sh
- Check logs in
logs/directory (if accessible) - Adjust limits in
config.pyif needed
π Monitoring & Maintenance
Daily Tasks:
- Check usage stats in the sidebar
- Watch for notification alerts on your phone/browser
Weekly Tasks:
- Review
logs/usage.jsonlfor usage patterns - Check
logs/security.jsonlfor any threats - Adjust rate limits if needed
Monthly Tasks:
- Generate monthly cost report
- Review budget and adjust if needed
- Update system prompt based on user feedback
Generating Reports:
from utils.cost_tracker import CostTracker
tracker = CostTracker()
# Daily report
print(tracker.generate_daily_report())
# Monthly report
print(tracker.generate_monthly_report(2024, 12))
# Custom date
from datetime import datetime
print(tracker.generate_daily_report(datetime(2024, 12, 15)))
βοΈ Configuration Reference
All configuration is in config.py. Key settings:
Cost Management:
DAILY_QUERY_LIMIT = 200 # Max queries per day
MONTHLY_BUDGET_USD = 50.0 # Hard budget cap
DAILY_BUDGET_WARNING = 5.0 # Alert threshold
Rate Limiting:
RATE_LIMIT_PER_HOUR = 20 # Queries per hour
RATE_LIMIT_PER_DAY = 200 # Queries per 24 hours
RATE_LIMIT_WARNING_THRESHOLD = 0.8 # Warn at 80%
Security:
MAX_INPUT_LENGTH = 2000 # Max characters
MIN_INPUT_LENGTH = 1 # Min characters
Alerts:
NTFY_TOPIC = "" # Your ntfy.sh topic
ALERTS_ENABLED = True # Enable/disable
Response Quality:
CONVERSATION_HISTORY_LENGTH = 5 # Messages of context
ENHANCED_SYSTEM_PROMPT = "..." # Full prompt in file
UI/UX:
SUGGESTED_QUESTIONS = [...] # Starter questions
PRIVACY_NOTICE = "..." # Privacy text
π§ Troubleshooting
Logs not being created:
- Check file permissions
- Ensure
logs/directory is not in.gitignorefor deployment - HuggingFace Spaces may not persist logs across restarts
Notifications not working:
- Verify
NTFY_TOPICis set correctly - Test with:
curl -d "test" ntfy.sh/your-topic - Check you're subscribed to the right topic
- Ensure
ALERTS_ENABLED = Truein config
Rate limits too strict/lenient:
- Adjust
RATE_LIMIT_PER_HOURandRATE_LIMIT_PER_DAYinconfig.py - Changes take effect on app restart
Budget exceeded too quickly:
- Review
logs/usage.jsonlfor unusual activity - Check if there's an attack (many rapid queries)
- Adjust
MONTHLY_BUDGET_USDif legitimate traffic
Conversation context not working:
- Verify
CONVERSATION_HISTORY_LENGTH > 0 - Check that messages are being stored in
st.session_state.messages
π Additional Resources
- Gemini API Pricing: https://ai.google.dev/pricing
- ntfy.sh Documentation: https://ntfy.sh
- HuggingFace Spaces: https://huggingface.co/docs/hub/spaces
- Streamlit Documentation: https://docs.streamlit.io
π― What You Need to Do
Required:
- β Deploy the updated code to HuggingFace Spaces
- β
Set
GEMINI_API_KEYsecret in HuggingFace - β Test with a few queries to verify it works
Optional but Recommended:
π± Set up ntfy.sh notifications:
- Pick a random topic name
- Subscribe on your phone/browser
- Set
NTFY_TOPICin HuggingFace secrets - Test it works
βοΈ Adjust configuration in
config.py:- Set appropriate rate limits
- Set monthly budget
- Customize suggested questions
π Monitor usage:
- Check sidebar stats regularly
- Watch for notification alerts
- Review logs if accessible
π Support
If you encounter any issues:
- Check the troubleshooting section above
- Review the logs (if accessible)
- Check HuggingFace Spaces logs for errors
- Verify environment variables are set correctly
That's it! All the production-ready features from the roadmap have been implemented. The system is now protected against cost overruns, abuse, and security threats, with monitoring and alerting in place.