SPARKNET / SECURITY.md
MHamdan's picture
Enhance SPARKNET for TTO automation with new scenarios and security features
76c3b0a
# SPARKNET Security Documentation
This document outlines security considerations, deployment options, and compliance
guidelines for the SPARKNET AI-Powered Technology Transfer Office Automation Platform.
## Overview
SPARKNET handles sensitive data including:
- Patent documents and IP information
- License agreements and financial terms
- Partner/stakeholder contact information
- Research data and findings
Proper security measures are essential for production deployments.
---
## Deployment Options
### 1. Fully Local Deployment (Maximum Privacy)
**Recommended for:** Organizations with strict data sovereignty requirements, classified research, or GDPR Article 17 obligations.
```
┌─────────────────────────────────────────────────────────────┐
│ Your Private Network │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ SPARKNET │──│ Ollama │──│ Local Vector Store │ │
│ │ (Streamlit)│ │ (LLM) │ │ (ChromaDB) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
│ ┌─────────────┐ ┌─────────────────────────────────────┐ │
│ │ PostgreSQL │ │ Document Storage (NFS/S3-compat) │ │
│ │ (metadata) │ │ │ │
│ └─────────────┘ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Configuration:**
- Set no cloud API keys in `.env`
- System automatically uses Ollama for all inference
- All data remains within your network
- No external API calls for LLM inference
**Setup:**
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull required models
ollama pull llama3.2:latest
ollama pull nomic-embed-text
# Configure SPARKNET
cp .env.example .env
# Leave cloud API keys empty
# Run
streamlit run demo/app.py
```
### 2. Hybrid Deployment (Balanced)
**Recommended for:** Organizations that want cloud LLM capabilities for non-sensitive operations while keeping sensitive data local.
```
┌─────────────────────────────────────────────────────────────┐
│ Your Private Network │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ SPARKNET │──│ Ollama │──│ Document Storage │ │
│ │ (Streamlit)│ │ (Sensitive)│ │ (Encrypted) │ │
│ └──────┬──────┘ └─────────────┘ └─────────────────────┘ │
└─────────│───────────────────────────────────────────────────┘
│ (Non-sensitive queries only)
┌─────────────────────────────────────────────────────────────┐
│ Cloud LLM Providers │
│ ┌─────────┐ ┌─────────┐ ┌─────────────┐ ┌───────────┐ │
│ │ Groq │ │ Gemini │ │ OpenRouter │ │ GitHub │ │
│ └─────────┘ └─────────┘ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Configuration:**
- Configure cloud API keys for general queries
- Use document sensitivity classification
- Route sensitive documents to local Ollama
- Implement data anonymization for cloud queries
### 3. Cloud Deployment (Streamlit Cloud)
**Recommended for:** Public demos, non-sensitive research, or when local infrastructure is not available.
**Configuration:**
```toml
# .streamlit/secrets.toml
[auth]
password = "your-secure-password"
GROQ_API_KEY = "your-key"
GOOGLE_API_KEY = "your-key"
```
**Security Checklist:**
- [ ] Use secrets management (never commit API keys)
- [ ] Enable authentication
- [ ] Review provider data processing policies
- [ ] Consider data anonymization
- [ ] Implement session timeouts
---
## GDPR Compliance
### Data Processing Principles
SPARKNET is designed to support GDPR compliance:
1. **Lawfulness, Fairness, Transparency**
- Document all data processing activities
- Obtain appropriate consent for personal data
- Provide clear privacy notices
2. **Purpose Limitation**
- Use data only for stated TTO purposes
- Do not repurpose data without consent
3. **Data Minimization**
- Only process necessary data
- Anonymize data when possible
- Implement data retention policies
4. **Accuracy**
- CriticAgent validation helps ensure accuracy
- Human-in-the-loop for critical decisions
- Source verification for claims
5. **Storage Limitation**
- Configure `DATA_RETENTION_DAYS` in `.env`
- Implement automatic data purging
- Support data deletion requests
6. **Integrity and Confidentiality**
- Encrypt data at rest
- Use TLS for data in transit
- Implement access controls
### Data Subject Rights
Support for GDPR data subject rights:
| Right | Implementation |
|-------|----------------|
| Access | Export function for user data |
| Rectification | Edit capabilities in UI |
| Erasure | Delete user data on request |
| Portability | JSON/CSV export options |
| Objection | Opt-out from AI processing |
### Cross-Border Data Transfers
When using cloud LLM providers:
1. **EU-US Data Transfers:**
- Review provider's Data Processing Agreement
- Ensure Standard Contractual Clauses in place
- Consider EU-hosted alternatives
2. **Recommended Approach:**
- Use Ollama for EU data residency
- Anonymize data before cloud API calls
- Implement geographic routing
---
## Security Best Practices
### API Key Management
```python
# GOOD: Load from environment/secrets
api_key = os.environ.get("GROQ_API_KEY")
# or
api_key = st.secrets.get("GROQ_API_KEY")
# BAD: Hardcoded keys
api_key = "gsk_abc123..." # NEVER DO THIS
```
### Authentication
Configure authentication in `.streamlit/secrets.toml`:
```toml
[auth]
# Single user
password = "strong-password-here"
# Multi-user
[auth.users]
admin = "admin-password"
analyst = "analyst-password"
viewer = "viewer-password"
```
### Audit Logging
Enable audit logging for compliance:
```env
AUDIT_LOG_ENABLED=true
AUDIT_LOG_PATH=./logs/audit.log
```
Audit log includes:
- User authentication events
- Document access
- AI query/response pairs
- Decision point approvals
### Network Security
For production deployments:
1. **Firewall Rules:**
- Restrict Ollama to internal network
- Limit database access to app servers
- Use VPN for remote access
2. **TLS/SSL:**
- Enable HTTPS for Streamlit
- Use encrypted database connections
- Secure WebSocket connections
3. **Access Control:**
- Implement role-based access
- Use IP allowlisting where possible
- Enable MFA for admin access
---
## Sensitive Data Handling
### Document Classification
SPARKNET can classify documents by sensitivity:
| Level | Description | Handling |
|-------|-------------|----------|
| Public | Non-confidential | Cloud LLM allowed |
| Internal | Business confidential | Prefer local |
| Confidential | Sensitive business | Local only |
| Restricted | Highly sensitive | Local + encryption |
### PII Detection
Enable PII detection:
```env
PII_DETECTION_ENABLED=true
```
Detected PII types:
- Names (persons)
- Email addresses
- Phone numbers
- Addresses
- ID numbers
### Data Anonymization
For cloud API calls, implement anonymization:
```python
# Pseudonymization example
text = text.replace(real_name, "[PERSON_1]")
text = text.replace(company_name, "[COMPANY_1]")
```
---
## Incident Response
### Security Incident Procedure
1. **Detection:** Monitor audit logs and alerts
2. **Containment:** Isolate affected systems
3. **Investigation:** Determine scope and impact
4. **Notification:** Inform stakeholders (72h for GDPR)
5. **Recovery:** Restore from clean backups
6. **Lessons Learned:** Update security measures
### Contact
For security issues:
- Review issue privately before public disclosure
- Report to project maintainers
- Follow responsible disclosure practices
---
## Compliance Checklist
### Pre-Deployment
- [ ] API keys stored in secrets management
- [ ] Authentication configured
- [ ] Audit logging enabled
- [ ] Data retention policy defined
- [ ] Backup strategy implemented
- [ ] Network security reviewed
### GDPR Compliance
- [ ] Data processing register updated
- [ ] Privacy notice published
- [ ] Data subject rights procedures in place
- [ ] Cross-border transfer safeguards
- [ ] Data Protection Impact Assessment (if required)
### Ongoing
- [ ] Regular security audits
- [ ] Log review and monitoring
- [ ] Access control review
- [ ] Incident response testing
- [ ] Staff security training
---
## Additional Resources
- [GDPR Official Text](https://gdpr.eu/)
- [Ollama Documentation](https://ollama.com/)
- [Streamlit Security](https://docs.streamlit.io/deploy/streamlit-community-cloud/security)
- [OWASP Top 10](https://owasp.org/Top10/)
---
*SPARKNET - VISTA/Horizon EU Project*
*Last Updated: 2025*