File size: 10,628 Bytes
76c3b0a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
# SPARKNET Security Documentation
This document outlines security considerations, deployment options, and compliance
guidelines for the SPARKNET AI-Powered Technology Transfer Office Automation Platform.
## Overview
SPARKNET handles sensitive data including:
- Patent documents and IP information
- License agreements and financial terms
- Partner/stakeholder contact information
- Research data and findings
Proper security measures are essential for production deployments.
---
## Deployment Options
### 1. Fully Local Deployment (Maximum Privacy)
**Recommended for:** Organizations with strict data sovereignty requirements, classified research, or GDPR Article 17 obligations.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Private Network β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
β β SPARKNET ββββ Ollama ββββ Local Vector Store β β
β β (Streamlit)β β (LLM) β β (ChromaDB) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
β β β
β βββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β β PostgreSQL β β Document Storage (NFS/S3-compat) β β
β β (metadata) β β β β
β βββββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
**Configuration:**
- Set no cloud API keys in `.env`
- System automatically uses Ollama for all inference
- All data remains within your network
- No external API calls for LLM inference
**Setup:**
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull required models
ollama pull llama3.2:latest
ollama pull nomic-embed-text
# Configure SPARKNET
cp .env.example .env
# Leave cloud API keys empty
# Run
streamlit run demo/app.py
```
### 2. Hybrid Deployment (Balanced)
**Recommended for:** Organizations that want cloud LLM capabilities for non-sensitive operations while keeping sensitive data local.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Private Network β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
β β SPARKNET ββββ Ollama ββββ Document Storage β β
β β (Streamlit)β β (Sensitive)β β (Encrypted) β β
β ββββββββ¬βββββββ βββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β (Non-sensitive queries only)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cloud LLM Providers β
β βββββββββββ βββββββββββ βββββββββββββββ βββββββββββββ β
β β Groq β β Gemini β β OpenRouter β β GitHub β β
β βββββββββββ βββββββββββ βββββββββββββββ βββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
**Configuration:**
- Configure cloud API keys for general queries
- Use document sensitivity classification
- Route sensitive documents to local Ollama
- Implement data anonymization for cloud queries
### 3. Cloud Deployment (Streamlit Cloud)
**Recommended for:** Public demos, non-sensitive research, or when local infrastructure is not available.
**Configuration:**
```toml
# .streamlit/secrets.toml
[auth]
password = "your-secure-password"
GROQ_API_KEY = "your-key"
GOOGLE_API_KEY = "your-key"
```
**Security Checklist:**
- [ ] Use secrets management (never commit API keys)
- [ ] Enable authentication
- [ ] Review provider data processing policies
- [ ] Consider data anonymization
- [ ] Implement session timeouts
---
## GDPR Compliance
### Data Processing Principles
SPARKNET is designed to support GDPR compliance:
1. **Lawfulness, Fairness, Transparency**
- Document all data processing activities
- Obtain appropriate consent for personal data
- Provide clear privacy notices
2. **Purpose Limitation**
- Use data only for stated TTO purposes
- Do not repurpose data without consent
3. **Data Minimization**
- Only process necessary data
- Anonymize data when possible
- Implement data retention policies
4. **Accuracy**
- CriticAgent validation helps ensure accuracy
- Human-in-the-loop for critical decisions
- Source verification for claims
5. **Storage Limitation**
- Configure `DATA_RETENTION_DAYS` in `.env`
- Implement automatic data purging
- Support data deletion requests
6. **Integrity and Confidentiality**
- Encrypt data at rest
- Use TLS for data in transit
- Implement access controls
### Data Subject Rights
Support for GDPR data subject rights:
| Right | Implementation |
|-------|----------------|
| Access | Export function for user data |
| Rectification | Edit capabilities in UI |
| Erasure | Delete user data on request |
| Portability | JSON/CSV export options |
| Objection | Opt-out from AI processing |
### Cross-Border Data Transfers
When using cloud LLM providers:
1. **EU-US Data Transfers:**
- Review provider's Data Processing Agreement
- Ensure Standard Contractual Clauses in place
- Consider EU-hosted alternatives
2. **Recommended Approach:**
- Use Ollama for EU data residency
- Anonymize data before cloud API calls
- Implement geographic routing
---
## Security Best Practices
### API Key Management
```python
# GOOD: Load from environment/secrets
api_key = os.environ.get("GROQ_API_KEY")
# or
api_key = st.secrets.get("GROQ_API_KEY")
# BAD: Hardcoded keys
api_key = "gsk_abc123..." # NEVER DO THIS
```
### Authentication
Configure authentication in `.streamlit/secrets.toml`:
```toml
[auth]
# Single user
password = "strong-password-here"
# Multi-user
[auth.users]
admin = "admin-password"
analyst = "analyst-password"
viewer = "viewer-password"
```
### Audit Logging
Enable audit logging for compliance:
```env
AUDIT_LOG_ENABLED=true
AUDIT_LOG_PATH=./logs/audit.log
```
Audit log includes:
- User authentication events
- Document access
- AI query/response pairs
- Decision point approvals
### Network Security
For production deployments:
1. **Firewall Rules:**
- Restrict Ollama to internal network
- Limit database access to app servers
- Use VPN for remote access
2. **TLS/SSL:**
- Enable HTTPS for Streamlit
- Use encrypted database connections
- Secure WebSocket connections
3. **Access Control:**
- Implement role-based access
- Use IP allowlisting where possible
- Enable MFA for admin access
---
## Sensitive Data Handling
### Document Classification
SPARKNET can classify documents by sensitivity:
| Level | Description | Handling |
|-------|-------------|----------|
| Public | Non-confidential | Cloud LLM allowed |
| Internal | Business confidential | Prefer local |
| Confidential | Sensitive business | Local only |
| Restricted | Highly sensitive | Local + encryption |
### PII Detection
Enable PII detection:
```env
PII_DETECTION_ENABLED=true
```
Detected PII types:
- Names (persons)
- Email addresses
- Phone numbers
- Addresses
- ID numbers
### Data Anonymization
For cloud API calls, implement anonymization:
```python
# Pseudonymization example
text = text.replace(real_name, "[PERSON_1]")
text = text.replace(company_name, "[COMPANY_1]")
```
---
## Incident Response
### Security Incident Procedure
1. **Detection:** Monitor audit logs and alerts
2. **Containment:** Isolate affected systems
3. **Investigation:** Determine scope and impact
4. **Notification:** Inform stakeholders (72h for GDPR)
5. **Recovery:** Restore from clean backups
6. **Lessons Learned:** Update security measures
### Contact
For security issues:
- Review issue privately before public disclosure
- Report to project maintainers
- Follow responsible disclosure practices
---
## Compliance Checklist
### Pre-Deployment
- [ ] API keys stored in secrets management
- [ ] Authentication configured
- [ ] Audit logging enabled
- [ ] Data retention policy defined
- [ ] Backup strategy implemented
- [ ] Network security reviewed
### GDPR Compliance
- [ ] Data processing register updated
- [ ] Privacy notice published
- [ ] Data subject rights procedures in place
- [ ] Cross-border transfer safeguards
- [ ] Data Protection Impact Assessment (if required)
### Ongoing
- [ ] Regular security audits
- [ ] Log review and monitoring
- [ ] Access control review
- [ ] Incident response testing
- [ ] Staff security training
---
## Additional Resources
- [GDPR Official Text](https://gdpr.eu/)
- [Ollama Documentation](https://ollama.com/)
- [Streamlit Security](https://docs.streamlit.io/deploy/streamlit-community-cloud/security)
- [OWASP Top 10](https://owasp.org/Top10/)
---
*SPARKNET - VISTA/Horizon EU Project*
*Last Updated: 2025*
|