# Production Features Implementation Guide

This document explains what has been implemented for the Hickey Lab AI Assistant and how to configure and use each feature.

---

## 📦 What Has Been Implemented

All the following features from the production roadmap have been implemented:

### ✅ Phase 1: Foundation - Cost & Security Controls (High Priority 🔴)

#### 1. **Cost Management Module** (`utils/cost_tracker.py`)
Tracks API token usage and costs to prevent budget overruns.

**What it does:**
- Extracts token counts from every Gemini API response
- Calculates costs based on Gemini 2.5 Flash pricing ($0.075 per 1M input tokens, $0.30 per 1M output tokens)
- Logs all usage to `logs/usage.jsonl` with timestamps
- Tracks daily and monthly usage statistics
- Enforces budget caps (blocks service when exceeded)
- Generates usage reports

**How to use it:**
1. Set budget limits in `config.py`:
   - `DAILY_QUERY_LIMIT`: Maximum queries per day (default: 200)
   - `MONTHLY_BUDGET_USD`: Monthly budget cap (default: $50)
   - `DAILY_BUDGET_WARNING`: Warning threshold (default: $5)

2. View usage stats in the sidebar by checking "📊 Show Usage Stats"

3. Generate reports manually:
   ```python
   from utils.cost_tracker import CostTracker
   tracker = CostTracker()
   print(tracker.generate_daily_report())
   print(tracker.generate_monthly_report(2024, 12))
   ```

#### 2. **Rate Limiting System** (`utils/rate_limiter.py`)
Prevents abuse through configurable rate limits.

**What it does:**
- Tracks queries per session using sliding time windows
- Enforces hourly limits (default: 20 queries per hour)
- Enforces daily limits (default: 200 queries per 24 hours)
- Shows warnings when approaching limits (at 80% by default)
- Blocks queries when limits exceeded with friendly messages
- Logs rate limit violations

**How to use it:**
1. Configure limits in `config.py`:
   - `RATE_LIMIT_PER_HOUR`: Queries per hour (default: 20)
   - `RATE_LIMIT_PER_DAY`: Queries per day (default: 200)
   - `RATE_LIMIT_WARNING_THRESHOLD`: When to warn (default: 0.8 = 80%)

2. Users will automatically see warnings like:
   - "⚠️ You have 4 questions remaining this hour"
   - "🕐 Rate limit reached! Please wait 15 minutes..."

#### 3. **Security Module** (`utils/security.py`)
Validates and sanitizes user input to prevent attacks.

**What it does:**
- Checks input length (1-2000 characters by default)
- Detects prompt injection attempts ("ignore previous instructions", etc.)
- Blocks suspicious patterns (script tags, template injection, etc.)
- Detects excessive special characters
- Logs all security violations for review

**How to use it:**
1. Configure limits in `config.py`:
   - `MAX_INPUT_LENGTH`: Maximum characters (default: 2000)
   - `MIN_INPUT_LENGTH`: Minimum characters (default: 1)

2. Security is automatic - invalid inputs are rejected with user-friendly messages

3. Review security logs in `logs/security.jsonl` to monitor threats

#### 4. **Alert System** (`utils/alerts.py`)
Sends push notifications for critical events using ntfy.sh.

**What it does:**
- Sends push notifications to your phone/browser via ntfy.sh (free, no signup)
- Alerts for rate limit violations
- Alerts for cost threshold breaches
- Alerts for suspicious activity
- Alerts for error spikes
- Supports priority levels (min, low, default, high, urgent)

**How to set it up:**

1. **Subscribe to notifications:**
   - Option A (Browser): Go to `https://ntfy.sh/YOUR-TOPIC-NAME` and click "Subscribe"
   - Option B (Mobile App):
     - Install ntfy app (iOS/Android)
     - Add subscription with your topic name

2. **Choose a SECURE topic name:**
   - ⚠️ IMPORTANT: Use a random, hard-to-guess name for security!
   - ✅ Good: `hickeylab-alerts-x9k2m7a4`
   - ❌ Bad: `hickeylab-alerts` (anyone can subscribe)

3. **Configure the topic:**
   - Set in `config.py`: `NTFY_TOPIC = "your-topic-name"`
   - Or set environment variable: `NTFY_TOPIC=your-topic-name`

4. **Test it:**
   ```bash
   python -c "from utils.alerts import AlertSystem; AlertSystem().test_alert()"
   ```
   Or:
   ```bash
   curl -d "Test alert" ntfy.sh/your-topic-name
   ```

**What you'll be notified about:**
- ⚠️ User hits rate limit
- 💰 Daily/monthly cost thresholds (80%, 100%)
- 🔍 Suspicious activity detected
- 🚨 Service paused due to budget limits

---

### ✅ Phase 2: Monitoring & Quality (Medium Priority 🟡)

#### 5. **Enhanced Logging**
All queries are logged with metadata for analysis.

**What's logged:**
- Timestamp
- Session ID (truncated for privacy)
- Question length
- Token counts (prompt, response, total)
- Estimated cost
- Response time
- Success/failure status
- Error messages (if any)

**Log files:**
- `logs/usage.jsonl` - All API usage
- `logs/rate_limits.jsonl` - Rate limit violations
- `logs/security.jsonl` - Security violations

#### 6. **Conversation Context**
Maintains context across multiple messages for better responses.

**What it does:**
- Includes last 5 exchanges in each query (configurable)
- Allows follow-up questions to reference previous messages
- Example:
  - User: "What is CODEX?"
  - Assistant: [explains CODEX]
  - User: "How does it compare to IBEX?"
  - Assistant: [compares CODEX (from context) to IBEX]

**How to configure:**
- Adjust `CONVERSATION_HISTORY_LENGTH` in `config.py` (default: 5)

#### 7. **Enhanced System Prompt**
Improved instructions for better response quality.

**What's improved:**
- Conversation context awareness
- Response structure guidelines (2-4 paragraphs for complex topics)
- Specific citation instructions
- Technical term explanation requirements
- Grounding in knowledge base (no hallucinations)

---

### ✅ Phase 3: User Experience (Low Priority 🟢)

#### 8. **Suggested Questions**
Shows starter questions when chat is empty.

**What it does:**
- Displays 4 suggested questions as clickable buttons
- Questions are configured in `config.py`
- Helps new users get started

**How to customize:**
- Edit `SUGGESTED_QUESTIONS` in `config.py`

#### 9. **Privacy Notice**
Displays privacy and usage information.

**What it shows:**
- Data processing information
- Usage limits
- Privacy policy

**How to customize:**
- Edit `PRIVACY_NOTICE` in `config.py`

#### 10. **Usage Statistics Dashboard**
Shows real-time usage stats in sidebar.

**What it shows:**
- Today's query count and cost
- This month's query count and cost
- Optional display (checkbox in sidebar)

#### 11. **Mobile Responsive Design**
Improved CSS for mobile devices.

**What's improved:**
- Touch-friendly button sizes (44px minimum)
- Appropriate font sizes
- No iOS zoom on input focus
- Responsive layout

---

## 🚀 Deployment Instructions

### For HuggingFace Spaces:

1. **Set up secrets:**
   - Go to Space Settings → Variables and secrets
   - Add `GEMINI_API_KEY` as a Secret
   - (Optional) Add `NTFY_TOPIC` for notifications

2. **Upload files:**
   - Upload the entire `outreach/pipelines/gemini_file_search/` directory
   - Ensure all files are included:
     - `app.py`
     - `config.py`
     - `requirements.txt`
     - `utils/` directory with all modules

3. **The app will automatically:**
   - Install dependencies from `requirements.txt`
   - Start the Streamlit app
   - Create `logs/` directory when first query is made

### Environment Variables:

| Variable | Required | Description |
|----------|----------|-------------|
| `GEMINI_API_KEY` | ✅ Yes | Your Google Gemini API key |
| `NTFY_TOPIC` | ❌ Optional | Your ntfy.sh topic for push notifications |

### First-Time Setup:

1. **Test the app** with a few queries
2. **Subscribe to notifications** if you set up ntfy.sh
3. **Check logs** in `logs/` directory (if accessible)
4. **Adjust limits** in `config.py` if needed

---

## 📊 Monitoring & Maintenance

### Daily Tasks:
- Check usage stats in the sidebar
- Watch for notification alerts on your phone/browser

### Weekly Tasks:
- Review `logs/usage.jsonl` for usage patterns
- Check `logs/security.jsonl` for any threats
- Adjust rate limits if needed

### Monthly Tasks:
- Generate monthly cost report
- Review budget and adjust if needed
- Update system prompt based on user feedback

### Generating Reports:

```python
from utils.cost_tracker import CostTracker

tracker = CostTracker()

# Daily report
print(tracker.generate_daily_report())

# Monthly report
print(tracker.generate_monthly_report(2024, 12))

# Custom date
from datetime import datetime
print(tracker.generate_daily_report(datetime(2024, 12, 15)))
```

---

## ⚙️ Configuration Reference

All configuration is in `config.py`. Key settings:

### Cost Management:
```python
DAILY_QUERY_LIMIT = 200           # Max queries per day
MONTHLY_BUDGET_USD = 50.0         # Hard budget cap
DAILY_BUDGET_WARNING = 5.0        # Alert threshold
```

### Rate Limiting:
```python
RATE_LIMIT_PER_HOUR = 20          # Queries per hour
RATE_LIMIT_PER_DAY = 200          # Queries per 24 hours
RATE_LIMIT_WARNING_THRESHOLD = 0.8  # Warn at 80%
```

### Security:
```python
MAX_INPUT_LENGTH = 2000           # Max characters
MIN_INPUT_LENGTH = 1              # Min characters
```

### Alerts:
```python
NTFY_TOPIC = ""                   # Your ntfy.sh topic
ALERTS_ENABLED = True             # Enable/disable
```

### Response Quality:
```python
CONVERSATION_HISTORY_LENGTH = 5   # Messages of context
ENHANCED_SYSTEM_PROMPT = "..."   # Full prompt in file
```

### UI/UX:
```python
SUGGESTED_QUESTIONS = [...]       # Starter questions
PRIVACY_NOTICE = "..."           # Privacy text
```

---

## 🔧 Troubleshooting

### Logs not being created:
- Check file permissions
- Ensure `logs/` directory is not in `.gitignore` for deployment
- HuggingFace Spaces may not persist logs across restarts

### Notifications not working:
- Verify `NTFY_TOPIC` is set correctly
- Test with: `curl -d "test" ntfy.sh/your-topic`
- Check you're subscribed to the right topic
- Ensure `ALERTS_ENABLED = True` in config

### Rate limits too strict/lenient:
- Adjust `RATE_LIMIT_PER_HOUR` and `RATE_LIMIT_PER_DAY` in `config.py`
- Changes take effect on app restart

### Budget exceeded too quickly:
- Review `logs/usage.jsonl` for unusual activity
- Check if there's an attack (many rapid queries)
- Adjust `MONTHLY_BUDGET_USD` if legitimate traffic

### Conversation context not working:
- Verify `CONVERSATION_HISTORY_LENGTH > 0`
- Check that messages are being stored in `st.session_state.messages`

---

## 📚 Additional Resources

- **Gemini API Pricing**: https://ai.google.dev/pricing
- **ntfy.sh Documentation**: https://ntfy.sh
- **HuggingFace Spaces**: https://huggingface.co/docs/hub/spaces
- **Streamlit Documentation**: https://docs.streamlit.io

---

## 🎯 What You Need to Do

### Required:
1. ✅ Deploy the updated code to HuggingFace Spaces
2. ✅ Set `GEMINI_API_KEY` secret in HuggingFace
3. ✅ Test with a few queries to verify it works

### Optional but Recommended:
1. 📱 Set up ntfy.sh notifications:
   - Pick a random topic name
   - Subscribe on your phone/browser
   - Set `NTFY_TOPIC` in HuggingFace secrets
   - Test it works

2. ⚙️ Adjust configuration in `config.py`:
   - Set appropriate rate limits
   - Set monthly budget
   - Customize suggested questions

3. 📊 Monitor usage:
   - Check sidebar stats regularly
   - Watch for notification alerts
   - Review logs if accessible

---

## 📞 Support

If you encounter any issues:
1. Check the troubleshooting section above
2. Review the logs (if accessible)
3. Check HuggingFace Spaces logs for errors
4. Verify environment variables are set correctly

---

**That's it!** All the production-ready features from the roadmap have been implemented. The system is now protected against cost overruns, abuse, and security threats, with monitoring and alerting in place.