VoiceVault / DOCS /phase5_ui_access.md
NinjainPJs's picture
Initial release: VoiceVault v1.0.0 β€” Voice-First RAG Knowledge Agent
85f900d
# Phase 5 β€” Full UI, TTS & Access Control
**Status:** βœ… Complete | **Tests:** 55/55 passed | **Files:** 7 modules (3 UI tabs, 2 backend, 1 TTS, 1 updated app.py)
---
## What Was Built
Phase 5 wires all previous phases into a working end-to-end application.
| Module | Responsibility |
|--------|----------------|
| `voicevault/kb/kb_manager.py` | KB lifecycle: create, list, delete, ingest, password auth |
| `voicevault/tts/web_speech.py` | TTS text prep: strip citation markers before speech |
| `ui/tabs/ask_tab.py` | Full voice query pipeline in Gradio |
| `ui/tabs/kb_tab.py` | KB creation, document upload, management |
| `ui/tabs/analytics_tab.py` | Query stats from SQLite audit log |
| `ui/tabs/settings_tab.py` | Configuration panels (display-only) |
| `app.py` | Startup orchestration, pipeline wiring |
---
## KBManager
**File:** [voicevault/kb/kb_manager.py](../voicevault/kb/kb_manager.py)
### Central Database
All KBs share **one** SQLite database at `cfg.data_dir / "voicevault.db"`. This enables cross-KB queries, global analytics, and efficient listing without per-KB filesystem scanning.
### KB Name Validation
```python
_VALID_KB_NAME = re.compile(r"^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$|^[a-z0-9]$")
```
- Lowercase alphanumeric + hyphens only
- 1–64 characters
- Cannot start or end with a hyphen
- Prevents path traversal attacks (no `..`, `/`, `\`, spaces)
### Password Protection (bcrypt)
```python
password_hash = bcrypt.hashpw(
password.encode(), bcrypt.gensalt(rounds=cfg.bcrypt_rounds) # default: 12
).decode()
```
- Passwords are hashed at creation time β€” plaintext never stored
- `verify_password()` uses `bcrypt.checkpw()` for constant-time comparison
- Public KBs (no password) return True for any password check
### verify_password Logic
```
KB has no hash (public) β†’ True (always accessible)
KB has hash, no password β†’ False (protected but no credentials)
KB has hash, with password β†’ bcrypt.checkpw(password, hash)
```
### ingest_documents Flow
```python
ingest_documents(kb_name, file_paths, password=None):
1. Verify KB exists
2. Verify password
3. IndexBuilder(kb_name).ingest_file(path, db_path) per file
4. Return list[IngestionReport]
```
Delegates entirely to `IndexBuilder` (Phase 1) which handles parsing, chunking, embedding, ChromaDB upsert, BM25 rebuild, and deduplication.
### delete_kb Flow
```python
delete_kb(kb_name):
1. Verify KB exists (raises KBManagerError if not)
2. db.delete_kb() β†’ SQLite CASCADE deletes documents, chunks, query_log
3. shutil.rmtree(cfg.kb_dir(kb_name)) β†’ removes ChromaDB, BM25, files
```
Irreversible β€” the UI confirms before calling.
---
## TTS β€” Web Speech API
**File:** [voicevault/tts/web_speech.py](../voicevault/tts/web_speech.py)
The TTS engine runs entirely in the browser via the `SpeechSynthesis` API β€” zero API cost, zero server load. Python's role is text preparation only.
### prepare_for_tts()
```python
def prepare_for_tts(answer: str, is_refusal: bool = False) -> str:
if is_refusal or not answer:
return ""
text = _CITATION_MARKER_RE.sub("", answer) # strip [Source: ...]
text = re.sub(r"\s{2,}", " ", text).strip()
return text
```
Removes `[Source: filename, p.N]` markers before passing to the browser β€” reading "Source: paper dot pdf, p dot 3" aloud is poor UX. The JS bridge (`ui/components/audio_controls.py`) takes this cleaned text and calls `window._vv_tts.speak(text, rate, pitch)`.
---
## Ask Tab (Full Pipeline)
**File:** [ui/tabs/ask_tab.py](../ui/tabs/ask_tab.py)
### End-to-End Query Flow
```
1. User records audio β†’ stop_recording event fires
β†’ WhisperTranscriber.transcribe(audio_path) β†’ transcript text
2. User selects KB(s) β†’ clicks Ask
3. _query_fn():
a. QueryPreprocessor.process(query) β†’ pq (cleaned, typed)
b. HybridRetriever(kb_names=selected).search(pq.processed_query) β†’ results
c. ContextBuilder().build(results) β†’ (context_str, citation_map)
d. AnswerChain.generate(query, context, citation_map, history, query_type) β†’ generation
e. db.log_query(...) ← SHA-256 only, no raw text stored
f. format_citations_markdown(generation.citations) β†’ citation panel
g. prepare_for_tts(generation.answer, generation.is_refusal) β†’ TTS text
h. Update chatbot + citations + history state + TTS state
```
### State Management
- `gr.State([])` β€” conversation history as `list[tuple[str, str]]`
- `gr.State("")` β€” last answer text (for TTS playback)
Conversation history is passed to `AnswerChain._build_messages()` as proper `HumanMessage`/`AIMessage` pairs β€” the correct LangChain pattern for multi-turn conversation.
### Error Handling
Every failure path (no query, no KB selected, pipeline error) produces a user-visible error message in the chatbot rather than crashing. The query logger failure is non-critical (caught and warned, never raises).
### Factory Functions
Event handlers are returned as closures from factory functions:
```python
def _make_transcribe_fn(transcriber):
def _transcribe(audio_path): ...
return _transcribe
def _make_query_fn(answer_chain, db_path):
def _query(query, kb_names, history, chatbot): ...
return _query
```
This enables dependency injection without globals β€” the `transcriber` and `answer_chain` objects are passed in from `app.py` and captured in the closure.
---
## KB Tab (Management UI)
**File:** [ui/tabs/kb_tab.py](../ui/tabs/kb_tab.py)
Three operations wired to Gradio event handlers:
| Button | Handler | Output |
|--------|---------|--------|
| βž• Create KB | `_create_kb()` | Status message, refreshed dropdowns |
| πŸ“€ Index Documents | `_upload_docs()` | Ingestion report per file |
| πŸ—‘οΈ Delete KB | `_delete_kb()` | Status message, refreshed table + dropdowns |
After each create/delete, all dropdowns and the KB dataframe are updated via `gr.update(choices=...)` β€” no page refresh needed.
---
## Analytics Tab
**File:** [ui/tabs/analytics_tab.py](../ui/tabs/analytics_tab.py)
Pulls data from `sqlite_store.get_query_stats()` on refresh button click:
| Metric | Source |
|--------|--------|
| Total queries (7d) | `COUNT(*)` from `query_log` |
| Avg end-to-end latency | `AVG(total_latency_ms)` |
| Avg citations per answer | `AVG(citation_count)` |
| Queries by day | `GROUP BY DATE(timestamp)` |
| KB inventory | `KBManager.list_kbs()` |
Stats are not loaded on page load β€” the user clicks πŸ”„ Refresh to pull fresh data. This avoids unnecessary DB queries at startup.
---
## app.py β€” Startup Orchestration
**File:** [app.py](../app.py)
```python
_startup() β†’ (kb_manager, transcriber, answer_chain):
1. cfg.ensure_directories()
2. KBManager(db_path=data_dir/voicevault.db) ← initializes SQLite schema
3. WhisperTranscriber() ← lazy: no model loaded at startup
4. AnswerChain() ← lazy: LLM clients created per call
```
All three singletons are created once and passed to the UI tab builders. This avoids the model-loading overhead being repeated on every query.
---
## Security Decisions
### Password Storage
bcrypt with work factor 12 β€” prevents offline brute-force attacks even if the SQLite file is exfiltrated. The same rounds as industry standard (bcrypt rounds β‰₯ 10 is OWASP recommended).
### KB Name as Path Component
The KB name regex (`^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$`) prevents path traversal. All KB filesystem operations use `cfg.kb_dir(kb_name)` which returns `data_dir / kb_name` β€” impossible to escape with a validated slug.
### Query Audit Log β€” PII Protection
The raw query text is NEVER stored in SQLite. Only the SHA-256 hash of the query is stored (`voice_query_hash`). This satisfies GDPR "data minimization" β€” analytics work on aggregates, not raw user queries.
### No Globals in Event Handlers
All pipeline objects (transcriber, answer_chain, kb_manager) are passed via closures, not module-level globals. This makes the code testable (dependency injection) and prevents accidental shared state mutation.
---
## Test Coverage
**File:** [tests/test_phase5.py](../tests/test_phase5.py) | **55/55 passed**
| Class | Tests | What's verified |
|-------|-------|----------------|
| `TestKBManagerCreate` | 16 | Create, list, get, duplicate detection, 5 slug validation cases |
| `TestKBManagerDelete` | 3 | Delete removes from list, nonexistent raises, count decreases |
| `TestKBManagerPassword` | 7 | Public access, protected access, wrong pass, no pass, unknown KB, bcrypt format |
| `TestKBManagerStats` | 3 | Returns dict, has required keys, zeros on empty DB |
| `TestPreparForTTS` | 7 | Citation stripping, refusal β†’ empty, normal text unchanged, no double spaces |
| `TestCitationPanel` | 8 | Filename, page, section, excerpt, multiple, numbered, empty, type |
| `TestUIHelpers` | 7 | KB choices, KB table format, protected lock icon, append_chat, no mutation |
| `TestAppStartup` | 4 | build_app returns Blocks, all three tab builders run without error |
### Fixture Design
The `manager` fixture creates a fresh KBManager backed by a temp SQLite path for each test β€” complete isolation with no shared state between tests.
---
## Full Project Test Summary
| Phase | Tests | Status |
|-------|-------|--------|
| Phase 0 β€” Foundation | 58 passed | βœ… |
| Phase 1 β€” Ingestion | 46 passed | βœ… |
| Phase 2 β€” Retrieval | 33 passed, 0 errors | βœ… |
| Phase 3 β€” ASR | 45 passed, 2 skipped (soundfile) | βœ… |
| Phase 4 β€” Generation | 72 passed | βœ… |
| Phase 5 β€” UI & Access | 55 passed | βœ… |
| **Total** | **309 passed, 2 skipped** | βœ… |
**Note on conftest.py CPU fix:** `CUDA_VISIBLE_DEVICES="-1"` is set in `tests/conftest.py` to force CPU for all tests. This prevents CUDA compatibility errors on RTX 5070 (sm_120 not supported by packaged PyTorch ≀ 2.x). Production deployment on HuggingFace Spaces uses NVIDIA T4 (sm_75) which is fully compatible.