Spaces:

bobbyni819
/

HickeyLabSocialMedia

Sleeping

File size: 12,224 Bytes

abb96d7

# Production Features Implementation Guide

This document explains what has been implemented for the Hickey Lab AI Assistant and how to configure and use each feature.

---

## 📦 What Has Been Implemented

All the following features from the production roadmap have been implemented:

### ✅ Phase 1: Foundation - Cost & Security Controls (High Priority 🔴)

#### 1. **Cost Management Module** (`utils/cost_tracker.py`)

Tracks API token usage and costs to prevent budget overruns.



**What it does:**

- Extracts token counts from every Gemini API response

- Calculates costs based on Gemini 2.5 Flash pricing ($0.075 per 1M input tokens, $0.30 per 1M output tokens)

- Logs all usage to `logs/usage.jsonl` with timestamps

- Tracks daily and monthly usage statistics

- Enforces budget caps (blocks service when exceeded)

- Generates usage reports



**How to use it:**

1. Set budget limits in `config.py`:

   - `DAILY_QUERY_LIMIT`: Maximum queries per day (default: 200)

   - `MONTHLY_BUDGET_USD`: Monthly budget cap (default: $50)

   - `DAILY_BUDGET_WARNING`: Warning threshold (default: $5)



2. View usage stats in the sidebar by checking "📊 Show Usage Stats"



3. Generate reports manually:

   ```python

   from utils.cost_tracker import CostTracker
   tracker = CostTracker()
   print(tracker.generate_daily_report())
   print(tracker.generate_monthly_report(2024, 12))
   ```



#### 2. **Rate Limiting System** (`utils/rate_limiter.py`)

Prevents abuse through configurable rate limits.



**What it does:**

- Tracks queries per session using sliding time windows

- Enforces hourly limits (default: 20 queries per hour)

- Enforces daily limits (default: 200 queries per 24 hours)

- Shows warnings when approaching limits (at 80% by default)

- Blocks queries when limits exceeded with friendly messages

- Logs rate limit violations



**How to use it:**

1. Configure limits in `config.py`:

   - `RATE_LIMIT_PER_HOUR`: Queries per hour (default: 20)

   - `RATE_LIMIT_PER_DAY`: Queries per day (default: 200)

   - `RATE_LIMIT_WARNING_THRESHOLD`: When to warn (default: 0.8 = 80%)



2. Users will automatically see warnings like:

   - "⚠️ You have 4 questions remaining this hour"

   - "🕐 Rate limit reached! Please wait 15 minutes..."



#### 3. **Security Module** (`utils/security.py`)

Validates and sanitizes user input to prevent attacks.



**What it does:**

- Checks input length (1-2000 characters by default)

- Detects prompt injection attempts ("ignore previous instructions", etc.)

- Blocks suspicious patterns (script tags, template injection, etc.)

- Detects excessive special characters

- Logs all security violations for review



**How to use it:**

1. Configure limits in `config.py`:

   - `MAX_INPUT_LENGTH`: Maximum characters (default: 2000)

   - `MIN_INPUT_LENGTH`: Minimum characters (default: 1)



2. Security is automatic - invalid inputs are rejected with user-friendly messages



3. Review security logs in `logs/security.jsonl` to monitor threats



#### 4. **Alert System** (`utils/alerts.py`)

Sends push notifications for critical events using ntfy.sh.



**What it does:**

- Sends push notifications to your phone/browser via ntfy.sh (free, no signup)

- Alerts for rate limit violations

- Alerts for cost threshold breaches

- Alerts for suspicious activity

- Alerts for error spikes

- Supports priority levels (min, low, default, high, urgent)



**How to set it up:**



1. **Subscribe to notifications:**

   - Option A (Browser): Go to `https://ntfy.sh/YOUR-TOPIC-NAME` and click "Subscribe"

   - Option B (Mobile App):

     - Install ntfy app (iOS/Android)

     - Add subscription with your topic name



2. **Choose a SECURE topic name:**

   - ⚠️ IMPORTANT: Use a random, hard-to-guess name for security!

   - ✅ Good: `hickeylab-alerts-x9k2m7a4`

   - ❌ Bad: `hickeylab-alerts` (anyone can subscribe)



3. **Configure the topic:**

   - Set in `config.py`: `NTFY_TOPIC = "your-topic-name"`

   - Or set environment variable: `NTFY_TOPIC=your-topic-name`



4. **Test it:**

   ```bash

   python -c "from utils.alerts import AlertSystem; AlertSystem().test_alert()"

   ```
   Or:
   ```bash

   curl -d "Test alert" ntfy.sh/your-topic-name

   ```

**What you'll be notified about:**
- ⚠️ User hits rate limit
- 💰 Daily/monthly cost thresholds (80%, 100%)
- 🔍 Suspicious activity detected
- 🚨 Service paused due to budget limits

---

### ✅ Phase 2: Monitoring & Quality (Medium Priority 🟡)

#### 5. **Enhanced Logging**
All queries are logged with metadata for analysis.

**What's logged:**
- Timestamp
- Session ID (truncated for privacy)
- Question length
- Token counts (prompt, response, total)
- Estimated cost
- Response time
- Success/failure status
- Error messages (if any)

**Log files:**
- `logs/usage.jsonl` - All API usage
- `logs/rate_limits.jsonl` - Rate limit violations
- `logs/security.jsonl` - Security violations

#### 6. **Conversation Context**
Maintains context across multiple messages for better responses.

**What it does:**
- Includes last 5 exchanges in each query (configurable)
- Allows follow-up questions to reference previous messages
- Example:
  - User: "What is CODEX?"
  - Assistant: [explains CODEX]
  - User: "How does it compare to IBEX?"
  - Assistant: [compares CODEX (from context) to IBEX]

**How to configure:**
- Adjust `CONVERSATION_HISTORY_LENGTH` in `config.py` (default: 5)

#### 7. **Enhanced System Prompt**
Improved instructions for better response quality.

**What's improved:**
- Conversation context awareness
- Response structure guidelines (2-4 paragraphs for complex topics)
- Specific citation instructions
- Technical term explanation requirements
- Grounding in knowledge base (no hallucinations)

---

### ✅ Phase 3: User Experience (Low Priority 🟢)

#### 8. **Suggested Questions**
Shows starter questions when chat is empty.

**What it does:**
- Displays 4 suggested questions as clickable buttons
- Questions are configured in `config.py`
- Helps new users get started

**How to customize:**
- Edit `SUGGESTED_QUESTIONS` in `config.py`

#### 9. **Privacy Notice**
Displays privacy and usage information.

**What it shows:**
- Data processing information
- Usage limits
- Privacy policy

**How to customize:**
- Edit `PRIVACY_NOTICE` in `config.py`

#### 10. **Usage Statistics Dashboard**
Shows real-time usage stats in sidebar.

**What it shows:**
- Today's query count and cost
- This month's query count and cost
- Optional display (checkbox in sidebar)

#### 11. **Mobile Responsive Design**
Improved CSS for mobile devices.

**What's improved:**
- Touch-friendly button sizes (44px minimum)
- Appropriate font sizes
- No iOS zoom on input focus
- Responsive layout

---

## 🚀 Deployment Instructions

### For HuggingFace Spaces:

1. **Set up secrets:**
   - Go to Space Settings → Variables and secrets
   - Add `GEMINI_API_KEY` as a Secret
   - (Optional) Add `NTFY_TOPIC` for notifications

2. **Upload files:**
   - Upload the entire `outreach/pipelines/gemini_file_search/` directory
   - Ensure all files are included:
     - `app.py`
     - `config.py`
     - `requirements.txt`
     - `utils/` directory with all modules

3. **The app will automatically:**
   - Install dependencies from `requirements.txt`
   - Start the Streamlit app
   - Create `logs/` directory when first query is made

### Environment Variables:

| Variable | Required | Description |
|----------|----------|-------------|
| `GEMINI_API_KEY` | ✅ Yes | Your Google Gemini API key |
| `NTFY_TOPIC` | ❌ Optional | Your ntfy.sh topic for push notifications |

### First-Time Setup:

1. **Test the app** with a few queries
2. **Subscribe to notifications** if you set up ntfy.sh
3. **Check logs** in `logs/` directory (if accessible)
4. **Adjust limits** in `config.py` if needed

---

## 📊 Monitoring & Maintenance

### Daily Tasks:
- Check usage stats in the sidebar
- Watch for notification alerts on your phone/browser

### Weekly Tasks:
- Review `logs/usage.jsonl` for usage patterns
- Check `logs/security.jsonl` for any threats
- Adjust rate limits if needed

### Monthly Tasks:
- Generate monthly cost report
- Review budget and adjust if needed
- Update system prompt based on user feedback

### Generating Reports:

```python

from utils.cost_tracker import CostTracker



tracker = CostTracker()



# Daily report

print(tracker.generate_daily_report())



# Monthly report

print(tracker.generate_monthly_report(2024, 12))



# Custom date

from datetime import datetime

print(tracker.generate_daily_report(datetime(2024, 12, 15)))

```

---

## ⚙️ Configuration Reference

All configuration is in `config.py`. Key settings:

### Cost Management:
```python

DAILY_QUERY_LIMIT = 200           # Max queries per day

MONTHLY_BUDGET_USD = 50.0         # Hard budget cap

DAILY_BUDGET_WARNING = 5.0        # Alert threshold

```

### Rate Limiting:
```python

RATE_LIMIT_PER_HOUR = 20          # Queries per hour

RATE_LIMIT_PER_DAY = 200          # Queries per 24 hours

RATE_LIMIT_WARNING_THRESHOLD = 0.8  # Warn at 80%

```

### Security:
```python

MAX_INPUT_LENGTH = 2000           # Max characters

MIN_INPUT_LENGTH = 1              # Min characters

```

### Alerts:
```python

NTFY_TOPIC = ""                   # Your ntfy.sh topic

ALERTS_ENABLED = True             # Enable/disable

```

### Response Quality:
```python

CONVERSATION_HISTORY_LENGTH = 5   # Messages of context

ENHANCED_SYSTEM_PROMPT = "..."   # Full prompt in file

```

### UI/UX:
```python

SUGGESTED_QUESTIONS = [...]       # Starter questions

PRIVACY_NOTICE = "..."           # Privacy text

```

---

## 🔧 Troubleshooting

### Logs not being created:
- Check file permissions
- Ensure `logs/` directory is not in `.gitignore` for deployment
- HuggingFace Spaces may not persist logs across restarts

### Notifications not working:
- Verify `NTFY_TOPIC` is set correctly
- Test with: `curl -d "test" ntfy.sh/your-topic`
- Check you're subscribed to the right topic
- Ensure `ALERTS_ENABLED = True` in config

### Rate limits too strict/lenient:
- Adjust `RATE_LIMIT_PER_HOUR` and `RATE_LIMIT_PER_DAY` in `config.py`
- Changes take effect on app restart

### Budget exceeded too quickly:
- Review `logs/usage.jsonl` for unusual activity
- Check if there's an attack (many rapid queries)
- Adjust `MONTHLY_BUDGET_USD` if legitimate traffic

### Conversation context not working:
- Verify `CONVERSATION_HISTORY_LENGTH > 0`
- Check that messages are being stored in `st.session_state.messages`

---

## 📚 Additional Resources

- **Gemini API Pricing**: https://ai.google.dev/pricing
- **ntfy.sh Documentation**: https://ntfy.sh
- **HuggingFace Spaces**: https://huggingface.co/docs/hub/spaces
- **Streamlit Documentation**: https://docs.streamlit.io

---

## 🎯 What You Need to Do

### Required:
1. ✅ Deploy the updated code to HuggingFace Spaces
2. ✅ Set `GEMINI_API_KEY` secret in HuggingFace
3. ✅ Test with a few queries to verify it works

### Optional but Recommended:
1. 📱 Set up ntfy.sh notifications:
   - Pick a random topic name
   - Subscribe on your phone/browser
   - Set `NTFY_TOPIC` in HuggingFace secrets
   - Test it works

2. ⚙️ Adjust configuration in `config.py`:
   - Set appropriate rate limits
   - Set monthly budget
   - Customize suggested questions

3. 📊 Monitor usage:
   - Check sidebar stats regularly
   - Watch for notification alerts
   - Review logs if accessible

---

## 📞 Support

If you encounter any issues:
1. Check the troubleshooting section above
2. Review the logs (if accessible)
3. Check HuggingFace Spaces logs for errors
4. Verify environment variables are set correctly

---

**That's it!** All the production-ready features from the roadmap have been implemented. The system is now protected against cost overruns, abuse, and security threats, with monitoring and alerting in place.