Open-Nursing-Validator / PHASE2_ANALYTICS.md
NurseCitizenDeveloper's picture
Deploy Open Nursing Validator (Docker)
6d12932 verified
# Phase 2.2: Advanced Analytics Dashboard
## Overview
**Phase 2.2** transforms the application into an **analytics-driven platform** with comprehensive insights into system usage, clinical outcomes, compliance, and knowledge gaps.
**Prerequisite:** Phase 2.1 (PostgreSQL Database) must be completed
### Features
#### 1. **πŸ“Š Analytics Overview**
- Real-time key metrics
- Active users count
- Active sessions
- Total messages processed
- 24-hour event summary
#### 2. **πŸ“ˆ Usage Analytics**
- Daily active user trends
- Top users by message count
- Usage patterns over time
- Feature adoption metrics
#### 3. **πŸ“‹ Compliance Reporting**
- Login/logout audit trails
- Failed login tracking
- Data access audit logs
- HIPAA/GDPR compliance ready
- 90-day historical reporting
#### 4. **πŸ” Knowledge Gap Analysis**
- Question distribution by topic
- Unanswered query tracking
- Low-confidence answer detection
- Content recommendations
#### 5. **πŸ₯ Clinical Outcomes**
- Care plan duration metrics
- Goal achievement rates
- Patient satisfaction scores
- Clinical outcome trends
#### 6. **πŸ‘₯ User Activity Report**
- Per-user engagement metrics
- Last login tracking
- Message and session counts
- Role-based usage analysis
#### 7. **βš™οΈ System Health**
- Database status
- API response times
- Vector database readiness
- System uptime tracking
---
## Setup Instructions
### Prerequisites
1. **Phase 2.1 Completed**
- PostgreSQL database running
- Chat history in database
- Audit logs populated
2. **Python Packages**
```bash
pip install pandas plotly streamlit
```
(Already in requirements.txt)
### Configuration
```bash
# Update .env.production
export APP_ENV=production
export USE_DATABASE=true
export DB_HOST=localhost
export DB_PORT=5432
export DB_NAME=nursing_validator
export DB_USER=nursing_admin
export DB_PASSWORD=your_password
```
### Running Analytics
```bash
# Create analytics-only dashboard
streamlit run analytics_dashboard.py
# Or add analytics tab to main app
streamlit run app_phase2.py
# Then navigate to Analytics tab (admin only)
```
---
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit Analytics Frontend β”‚
β”‚ (analytics_dashboard.py) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Overviewβ”‚Usageβ”‚Complianceβ”‚Gaps... β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ AnalyticsDashboard Class β”‚
β”‚ - display_overview() β”‚
β”‚ - display_usage_dashboard() β”‚
β”‚ - display_compliance_report() β”‚
β”‚ - display_knowledge_gaps() β”‚
β”‚ - display_clinical_outcomes() β”‚
β”‚ - display_user_activity() β”‚
β”‚ - display_system_health() β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Data Processing Layer β”‚
β”‚ (pandas + plotly visualization) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Database Query Layer β”‚
β”‚ (database.py - via SQL) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ PostgreSQL Database β”‚
β”‚ β”œβ”€ chat_history β”‚
β”‚ β”œβ”€ audit_logs β”‚
β”‚ β”œβ”€ analytics_events β”‚
β”‚ β”œβ”€ users β”‚
β”‚ └─ sessions β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## API Reference
### AnalyticsDashboard Class
```python
from analytics_dashboard import AnalyticsDashboard
# Initialize
dashboard = AnalyticsDashboard()
# Display sections
dashboard.display_overview() # Key metrics
dashboard.display_usage_dashboard() # Usage trends
dashboard.display_compliance_report() # Audit logs
dashboard.display_knowledge_gaps() # Content analysis
dashboard.display_clinical_outcomes() # Patient metrics
dashboard.display_user_activity() # Per-user stats
dashboard.display_system_health() # System metrics
dashboard.display_export_options() # Data export
```
### Integration with Main App
```python
# In app_phase2.py, add analytics tab
with tab_analytics:
from analytics_dashboard import display_analytics_dashboard
# Only for admin users
if st.session_state.role == "admin":
display_analytics_dashboard()
else:
st.warning("Analytics available for admins only")
```
---
## Reports
### 1. Overview Report
**Metrics:**
- Active Users: Count of users with is_active = TRUE
- Active Sessions: Count where expires_at > NOW
- Total Messages: COUNT(*) from chat_history
- Events (24h): Audit logs in last 24 hours
```sql
SELECT COUNT(*) FROM users WHERE is_active = TRUE;
SELECT COUNT(*) FROM sessions WHERE expires_at > CURRENT_TIMESTAMP;
SELECT COUNT(*) FROM chat_history;
SELECT COUNT(*) FROM audit_logs
WHERE created_at > CURRENT_TIMESTAMP - INTERVAL '24 hours';
```
### 2. Usage Report
**Data:**
- Daily active users (line chart)
- Top 10 users by message count (bar chart)
- Time-range filtering (default: 30 days)
```sql
-- Daily active users
SELECT DATE(created_at), COUNT(DISTINCT user_id)
FROM chat_history
GROUP BY DATE(created_at);
-- Top users
SELECT u.username, COUNT(*)
FROM chat_history ch
JOIN users u ON ch.user_id = u.id
GROUP BY u.username
ORDER BY COUNT(*) DESC LIMIT 10;
```
### 3. Compliance Report
**Audit Events:**
- Login/logout tracking
- Failed login counts
- Data access by resource
- 90-day historical logs
- HIPAA compliance ready
```sql
-- Authentication events
SELECT action, COUNT(*), COUNT(DISTINCT user_id)
FROM audit_logs
WHERE action IN ('login', 'logout', 'failed_login')
GROUP BY action;
-- Data access
SELECT resource_type, COUNT(*), COUNT(DISTINCT user_id)
FROM audit_logs
WHERE resource_type IS NOT NULL
GROUP BY resource_type;
```
### 4. Knowledge Gap Analysis
**Analysis:**
- Question distribution by topic
- Topics: Care, Assessment, Interventions, Goals, Medications
- Pie chart of question types
- Identifies training needs
```sql
SELECT
CASE
WHEN content ILIKE '%care%' THEN 'Care Planning'
WHEN content ILIKE '%assessment%' THEN 'Assessment'
WHEN content ILIKE '%intervention%' THEN 'Interventions'
WHEN content ILIKE '%goal%' THEN 'Goals'
WHEN content ILIKE '%medication%' THEN 'Medications'
ELSE 'Other'
END as topic,
COUNT(*)
FROM chat_history
WHERE role = 'user'
GROUP BY topic;
```
### 5. User Activity Report
**Metrics:**
- Username, role, last login
- Message count per user
- Session count
- Active in last 7 days count
- Average messages per user
```sql
SELECT
u.username, u.role, u.last_login,
COUNT(DISTINCT ch.id) as messages,
COUNT(DISTINCT s.id) as sessions
FROM users u
LEFT JOIN chat_history ch ON u.id = ch.user_id
LEFT JOIN sessions s ON u.id = s.user_id
WHERE u.is_active = TRUE
GROUP BY u.id, u.username, u.role, u.last_login;
```
---
## Visualizations
### Line Chart: Daily Active Users
- **X-axis:** Date
- **Y-axis:** Number of unique users
- **Time Range:** Selectable (default: 30 days)
- **Type:** Line chart with markers
### Bar Chart: Top Users
- **X-axis:** Username
- **Y-axis:** Message count
- **Color:** Blue gradient (by message count)
- **Limit:** Top 10 users
### Pie Chart: Question Topics
- **Segments:** By topic (Care, Assessment, etc.)
- **Size:** Proportion of questions
- **Interactive:** Hover for details
### DataFrames: Audit Logs
- **Columns:** Timestamp, User, Action, Resource, IP
- **Sorting:** Reverse chronological
- **Limit:** 50 most recent
- **Filterable:** By date range
---
## Compliance Features
### HIPAA Compliance Ready
- βœ… Audit trails for all data access
- βœ… User authentication logging
- βœ… Encryption ready (TLS)
- βœ… Data retention policies configurable
- βœ… IP address logging for accountability
### GDPR Compliance Ready
- βœ… User activity tracking
- βœ… Data access audit logs
- βœ… Right to be forgotten support (can delete user)
- βœ… Data export capabilities
- βœ… Consent management ready
### Audit Trail
All user actions logged with:
- Timestamp (UTC)
- User ID
- Action type
- Resource type/ID
- IP address
- Changes (JSONB format)
---
## Data Export
### Export Formats (Ready for Phase 2.3)
#### CSV Export
```python
df.to_csv('analytics_report.csv', index=False)
```
#### PDF Export
```python
# Uses plotly for static image export
fig.write_image("report.pdf")
```
#### Excel Export
```python
df.to_excel('analytics_report.xlsx', sheet_name='Analytics')
```
### Sample Export Query
```python
import pandas as pd
from database import get_connection
# Export user activity
with get_connection() as conn:
df = pd.read_sql_query("""
SELECT u.username, u.role, COUNT(*) as messages
FROM chat_history ch
JOIN users u ON ch.user_id = u.id
GROUP BY u.id, u.username, u.role
""", conn)
df.to_excel('user_activity.xlsx')
df.to_csv('user_activity.csv')
```
---
## Troubleshooting
### "Database required for analytics"
```
Error: Database module not available
Solution: Install psycopg2-binary
$ pip install psycopg2-binary
```
### No data showing in charts
```
Possible causes:
1. No chat history yet (new database)
2. Time range filters with no data
3. Database connection issue
Debug:
SELECT COUNT(*) FROM chat_history;
SELECT COUNT(*) FROM users;
SELECT COUNT(*) FROM audit_logs;
```
### Slow dashboard loading
```
Solutions:
1. Add indexes (already done in Phase 2.1)
2. Limit time range (default 30 days)
3. Increase database pool size
4. Cache results for 5 minutes
```
### Connection timeouts
```
Fix timeout:
DB_HOST=localhost
DB_PORT=5432
# Verify PostgreSQL running:
sudo service postgresql status
sudo service postgresql start
```
---
## Performance Optimization
### Database Indexes
Already created in Phase 2.1:
```sql
CREATE INDEX idx_chat_history_created_at ON chat_history(created_at);
CREATE INDEX idx_analytics_user_id ON analytics_events(user_id);
CREATE INDEX idx_audit_logs_user_id ON audit_logs(user_id);
```
### Query Caching
```python
@st.cache_data(ttl=300) # Cache for 5 minutes
def get_dashboard_data():
# Expensive query here
pass
```
### Pagination
```python
# Limit audit logs to 50 most recent
LIMIT 50 OFFSET (page - 1) * 50
```
---
## Next Steps
1. **Deploy Phase 2.2** to staging
2. **Test analytics queries** with sample data
3. **Validate compliance reports** meet requirements
4. **Proceed to Phase 2.3** - EHR/FHIR Integration
---
## Files Created/Modified
**New Files:**
- `analytics_dashboard.py` (400+ lines)
**Integration Points:**
- Update `app_phase2.py` to add analytics tab
- Add analytics logging to chat module
**Documentation:**
- `PHASE2_ANALYTICS.md` (this file)
---
*Phase 2.2 Implementation - November 29, 2025*