| # SPARKNET Security Documentation | |
| This document outlines security considerations, deployment options, and compliance | |
| guidelines for the SPARKNET AI-Powered Technology Transfer Office Automation Platform. | |
| ## Overview | |
| SPARKNET handles sensitive data including: | |
| - Patent documents and IP information | |
| - License agreements and financial terms | |
| - Partner/stakeholder contact information | |
| - Research data and findings | |
| Proper security measures are essential for production deployments. | |
| --- | |
| ## Deployment Options | |
| ### 1. Fully Local Deployment (Maximum Privacy) | |
| **Recommended for:** Organizations with strict data sovereignty requirements, classified research, or GDPR Article 17 obligations. | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Your Private Network │ | |
| │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ | |
| │ │ SPARKNET │──│ Ollama │──│ Local Vector Store │ │ | |
| │ │ (Streamlit)│ │ (LLM) │ │ (ChromaDB) │ │ | |
| │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ | |
| │ │ │ | |
| │ ┌─────────────┐ ┌─────────────────────────────────────┐ │ | |
| │ │ PostgreSQL │ │ Document Storage (NFS/S3-compat) │ │ | |
| │ │ (metadata) │ │ │ │ | |
| │ └─────────────┘ └─────────────────────────────────────┘ │ | |
| └─────────────────────────────────────────────────────────────┘ | |
| ``` | |
| **Configuration:** | |
| - Set no cloud API keys in `.env` | |
| - System automatically uses Ollama for all inference | |
| - All data remains within your network | |
| - No external API calls for LLM inference | |
| **Setup:** | |
| ```bash | |
| # Install Ollama | |
| curl -fsSL https://ollama.com/install.sh | sh | |
| # Pull required models | |
| ollama pull llama3.2:latest | |
| ollama pull nomic-embed-text | |
| # Configure SPARKNET | |
| cp .env.example .env | |
| # Leave cloud API keys empty | |
| # Run | |
| streamlit run demo/app.py | |
| ``` | |
| ### 2. Hybrid Deployment (Balanced) | |
| **Recommended for:** Organizations that want cloud LLM capabilities for non-sensitive operations while keeping sensitive data local. | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Your Private Network │ | |
| │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ | |
| │ │ SPARKNET │──│ Ollama │──│ Document Storage │ │ | |
| │ │ (Streamlit)│ │ (Sensitive)│ │ (Encrypted) │ │ | |
| │ └──────┬──────┘ └─────────────┘ └─────────────────────┘ │ | |
| └─────────│───────────────────────────────────────────────────┘ | |
| │ | |
| │ (Non-sensitive queries only) | |
| ▼ | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Cloud LLM Providers │ | |
| │ ┌─────────┐ ┌─────────┐ ┌─────────────┐ ┌───────────┐ │ | |
| │ │ Groq │ │ Gemini │ │ OpenRouter │ │ GitHub │ │ | |
| │ └─────────┘ └─────────┘ └─────────────┘ └───────────┘ │ | |
| └─────────────────────────────────────────────────────────────┘ | |
| ``` | |
| **Configuration:** | |
| - Configure cloud API keys for general queries | |
| - Use document sensitivity classification | |
| - Route sensitive documents to local Ollama | |
| - Implement data anonymization for cloud queries | |
| ### 3. Cloud Deployment (Streamlit Cloud) | |
| **Recommended for:** Public demos, non-sensitive research, or when local infrastructure is not available. | |
| **Configuration:** | |
| ```toml | |
| # .streamlit/secrets.toml | |
| [auth] | |
| password = "your-secure-password" | |
| GROQ_API_KEY = "your-key" | |
| GOOGLE_API_KEY = "your-key" | |
| ``` | |
| **Security Checklist:** | |
| - [ ] Use secrets management (never commit API keys) | |
| - [ ] Enable authentication | |
| - [ ] Review provider data processing policies | |
| - [ ] Consider data anonymization | |
| - [ ] Implement session timeouts | |
| --- | |
| ## GDPR Compliance | |
| ### Data Processing Principles | |
| SPARKNET is designed to support GDPR compliance: | |
| 1. **Lawfulness, Fairness, Transparency** | |
| - Document all data processing activities | |
| - Obtain appropriate consent for personal data | |
| - Provide clear privacy notices | |
| 2. **Purpose Limitation** | |
| - Use data only for stated TTO purposes | |
| - Do not repurpose data without consent | |
| 3. **Data Minimization** | |
| - Only process necessary data | |
| - Anonymize data when possible | |
| - Implement data retention policies | |
| 4. **Accuracy** | |
| - CriticAgent validation helps ensure accuracy | |
| - Human-in-the-loop for critical decisions | |
| - Source verification for claims | |
| 5. **Storage Limitation** | |
| - Configure `DATA_RETENTION_DAYS` in `.env` | |
| - Implement automatic data purging | |
| - Support data deletion requests | |
| 6. **Integrity and Confidentiality** | |
| - Encrypt data at rest | |
| - Use TLS for data in transit | |
| - Implement access controls | |
| ### Data Subject Rights | |
| Support for GDPR data subject rights: | |
| | Right | Implementation | | |
| |-------|----------------| | |
| | Access | Export function for user data | | |
| | Rectification | Edit capabilities in UI | | |
| | Erasure | Delete user data on request | | |
| | Portability | JSON/CSV export options | | |
| | Objection | Opt-out from AI processing | | |
| ### Cross-Border Data Transfers | |
| When using cloud LLM providers: | |
| 1. **EU-US Data Transfers:** | |
| - Review provider's Data Processing Agreement | |
| - Ensure Standard Contractual Clauses in place | |
| - Consider EU-hosted alternatives | |
| 2. **Recommended Approach:** | |
| - Use Ollama for EU data residency | |
| - Anonymize data before cloud API calls | |
| - Implement geographic routing | |
| --- | |
| ## Security Best Practices | |
| ### API Key Management | |
| ```python | |
| # GOOD: Load from environment/secrets | |
| api_key = os.environ.get("GROQ_API_KEY") | |
| # or | |
| api_key = st.secrets.get("GROQ_API_KEY") | |
| # BAD: Hardcoded keys | |
| api_key = "gsk_abc123..." # NEVER DO THIS | |
| ``` | |
| ### Authentication | |
| Configure authentication in `.streamlit/secrets.toml`: | |
| ```toml | |
| [auth] | |
| # Single user | |
| password = "strong-password-here" | |
| # Multi-user | |
| [auth.users] | |
| admin = "admin-password" | |
| analyst = "analyst-password" | |
| viewer = "viewer-password" | |
| ``` | |
| ### Audit Logging | |
| Enable audit logging for compliance: | |
| ```env | |
| AUDIT_LOG_ENABLED=true | |
| AUDIT_LOG_PATH=./logs/audit.log | |
| ``` | |
| Audit log includes: | |
| - User authentication events | |
| - Document access | |
| - AI query/response pairs | |
| - Decision point approvals | |
| ### Network Security | |
| For production deployments: | |
| 1. **Firewall Rules:** | |
| - Restrict Ollama to internal network | |
| - Limit database access to app servers | |
| - Use VPN for remote access | |
| 2. **TLS/SSL:** | |
| - Enable HTTPS for Streamlit | |
| - Use encrypted database connections | |
| - Secure WebSocket connections | |
| 3. **Access Control:** | |
| - Implement role-based access | |
| - Use IP allowlisting where possible | |
| - Enable MFA for admin access | |
| --- | |
| ## Sensitive Data Handling | |
| ### Document Classification | |
| SPARKNET can classify documents by sensitivity: | |
| | Level | Description | Handling | | |
| |-------|-------------|----------| | |
| | Public | Non-confidential | Cloud LLM allowed | | |
| | Internal | Business confidential | Prefer local | | |
| | Confidential | Sensitive business | Local only | | |
| | Restricted | Highly sensitive | Local + encryption | | |
| ### PII Detection | |
| Enable PII detection: | |
| ```env | |
| PII_DETECTION_ENABLED=true | |
| ``` | |
| Detected PII types: | |
| - Names (persons) | |
| - Email addresses | |
| - Phone numbers | |
| - Addresses | |
| - ID numbers | |
| ### Data Anonymization | |
| For cloud API calls, implement anonymization: | |
| ```python | |
| # Pseudonymization example | |
| text = text.replace(real_name, "[PERSON_1]") | |
| text = text.replace(company_name, "[COMPANY_1]") | |
| ``` | |
| --- | |
| ## Incident Response | |
| ### Security Incident Procedure | |
| 1. **Detection:** Monitor audit logs and alerts | |
| 2. **Containment:** Isolate affected systems | |
| 3. **Investigation:** Determine scope and impact | |
| 4. **Notification:** Inform stakeholders (72h for GDPR) | |
| 5. **Recovery:** Restore from clean backups | |
| 6. **Lessons Learned:** Update security measures | |
| ### Contact | |
| For security issues: | |
| - Review issue privately before public disclosure | |
| - Report to project maintainers | |
| - Follow responsible disclosure practices | |
| --- | |
| ## Compliance Checklist | |
| ### Pre-Deployment | |
| - [ ] API keys stored in secrets management | |
| - [ ] Authentication configured | |
| - [ ] Audit logging enabled | |
| - [ ] Data retention policy defined | |
| - [ ] Backup strategy implemented | |
| - [ ] Network security reviewed | |
| ### GDPR Compliance | |
| - [ ] Data processing register updated | |
| - [ ] Privacy notice published | |
| - [ ] Data subject rights procedures in place | |
| - [ ] Cross-border transfer safeguards | |
| - [ ] Data Protection Impact Assessment (if required) | |
| ### Ongoing | |
| - [ ] Regular security audits | |
| - [ ] Log review and monitoring | |
| - [ ] Access control review | |
| - [ ] Incident response testing | |
| - [ ] Staff security training | |
| --- | |
| ## Additional Resources | |
| - [GDPR Official Text](https://gdpr.eu/) | |
| - [Ollama Documentation](https://ollama.com/) | |
| - [Streamlit Security](https://docs.streamlit.io/deploy/streamlit-community-cloud/security) | |
| - [OWASP Top 10](https://owasp.org/Top10/) | |
| --- | |
| *SPARKNET - VISTA/Horizon EU Project* | |
| *Last Updated: 2025* | |