Spaces:
Sleeping
Sleeping
Phase-4 Governance Policy: Semantic Search (FAISS)
Purpose
Phase-4 introduces optional semantic search capabilities using FAISS to enhance discovery across metadata only associated with publicly released FOIA records.
This policy governs whether, how, and under what constraints Phase-4 may be enabled.
Scope of Phase-4
Phase-4 MAY include:
- Vector embeddings of metadata fields only (title, agency, date, citation)
- User-initiated semantic similarity queries
- In-memory or user-controlled vector stores
Phase-4 MUST NOT include:
- Full-text document embeddings without explicit review
- Automated crawling or indexing
- Cross-user persistence
- Third-party model training on user data
- Background ingestion or scheduled jobs
Activation Requirements (ALL REQUIRED)
Phase-4 functionality remains hard-disabled by default.
Activation requires:
- Legal review approval
- Hugging Face Trust & Safety concurrence
- Explicit UI opt-in from the user
- Clear disclosure of embedding scope and limits
- Feature flag activation by maintainers
Data Handling Rules
- No raw PDF content stored by default
- No embeddings persisted beyond session unless user exports
- No cross-session correlation
- No private or sensitive data permitted
Transparency & Auditability
When enabled, Phase-4 must:
- Log feature activation locally (user-visible)
- Display semantic scope banner
- Provide deterministic reproducibility options
- Include integrity hashes for AI outputs
Kill-Switch & Rollback
- Feature flag allows immediate global disablement
- No migration required to roll back
- No user data loss on rollback
Governance Review Cadence
- Initial approval: One-time
- Re-review required for:
- New data sources
- New embedding models
- Persistent storage changes
Guiding Principle
Semantic discovery must never compromise transparency, provenance, or user consent.