Spaces:

NinjainPJs
/

VoiceVault

Running

App Files Files Community

VoiceVault / DOCS /phase5_ui_access.md

NinjainPJs

Initial release: VoiceVault v1.0.0 — Voice-First RAG Knowledge Agent

85f900d 3 months ago

preview code

raw

history blame contribute delete

9.96 kB

	# Phase 5 — Full UI, TTS & Access Control

	Status: ✅ Complete \| Tests: 55/55 passed \| Files: 7 modules (3 UI tabs, 2 backend, 1 TTS, 1 updated app.py)

	---

	## What Was Built

	Phase 5 wires all previous phases into a working end-to-end application.

	\| Module \| Responsibility \|
	\|--------\|----------------\|
	\| `voicevault/kb/kb_manager.py` \| KB lifecycle: create, list, delete, ingest, password auth \|
	\| `voicevault/tts/web_speech.py` \| TTS text prep: strip citation markers before speech \|
	\| `ui/tabs/ask_tab.py` \| Full voice query pipeline in Gradio \|
	\| `ui/tabs/kb_tab.py` \| KB creation, document upload, management \|
	\| `ui/tabs/analytics_tab.py` \| Query stats from SQLite audit log \|
	\| `ui/tabs/settings_tab.py` \| Configuration panels (display-only) \|
	\| `app.py` \| Startup orchestration, pipeline wiring \|

	---

	## KBManager

	File: [voicevault/kb/kb_manager.py](../voicevault/kb/kb_manager.py)

	### Central Database

	All KBs share one SQLite database at `cfg.data_dir / "voicevault.db"`. This enables cross-KB queries, global analytics, and efficient listing without per-KB filesystem scanning.

	### KB Name Validation

	```python
	_VALID_KB_NAME = re.compile(r"^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$\|^[a-z0-9]$")
	```

	- Lowercase alphanumeric + hyphens only
	- 1–64 characters
	- Cannot start or end with a hyphen
	- Prevents path traversal attacks (no `..`, `/`, `\`, spaces)

	### Password Protection (bcrypt)

	```python
	password_hash = bcrypt.hashpw(
	password.encode(), bcrypt.gensalt(rounds=cfg.bcrypt_rounds) # default: 12
	).decode()
	```

	- Passwords are hashed at creation time — plaintext never stored
	- `verify_password()` uses `bcrypt.checkpw()` for constant-time comparison
	- Public KBs (no password) return True for any password check

	### verify_password Logic

	```
	KB has no hash (public) → True (always accessible)
	KB has hash, no password → False (protected but no credentials)
	KB has hash, with password → bcrypt.checkpw(password, hash)
	```

	### ingest_documents Flow

	```python
	ingest_documents(kb_name, file_paths, password=None):
	1. Verify KB exists
	2. Verify password
	3. IndexBuilder(kb_name).ingest_file(path, db_path) per file
	4. Return list[IngestionReport]
	```

	Delegates entirely to `IndexBuilder` (Phase 1) which handles parsing, chunking, embedding, ChromaDB upsert, BM25 rebuild, and deduplication.

	### delete_kb Flow

	```python
	delete_kb(kb_name):
	1. Verify KB exists (raises KBManagerError if not)
	2. db.delete_kb() → SQLite CASCADE deletes documents, chunks, query_log
	3. shutil.rmtree(cfg.kb_dir(kb_name)) → removes ChromaDB, BM25, files
	```

	Irreversible — the UI confirms before calling.

	---

	## TTS — Web Speech API

	File: [voicevault/tts/web_speech.py](../voicevault/tts/web_speech.py)

	The TTS engine runs entirely in the browser via the `SpeechSynthesis` API — zero API cost, zero server load. Python's role is text preparation only.

	### prepare_for_tts()

	```python
	def prepare_for_tts(answer: str, is_refusal: bool = False) -> str:
	if is_refusal or not answer:
	return ""
	text = _CITATION_MARKER_RE.sub("", answer) # strip [Source: ...]
	text = re.sub(r"\s{2,}", " ", text).strip()
	return text
	```

	Removes `[Source: filename, p.N]` markers before passing to the browser — reading "Source: paper dot pdf, p dot 3" aloud is poor UX. The JS bridge (`ui/components/audio_controls.py`) takes this cleaned text and calls `window._vv_tts.speak(text, rate, pitch)`.

	---

	## Ask Tab (Full Pipeline)

	File: [ui/tabs/ask_tab.py](../ui/tabs/ask_tab.py)

	### End-to-End Query Flow

	```
	1. User records audio → stop_recording event fires
	→ WhisperTranscriber.transcribe(audio_path) → transcript text

	2. User selects KB(s) → clicks Ask

	3. _query_fn():
	a. QueryPreprocessor.process(query) → pq (cleaned, typed)
	b. HybridRetriever(kb_names=selected).search(pq.processed_query) → results
	c. ContextBuilder().build(results) → (context_str, citation_map)
	d. AnswerChain.generate(query, context, citation_map, history, query_type) → generation
	e. db.log_query(...) ← SHA-256 only, no raw text stored
	f. format_citations_markdown(generation.citations) → citation panel
	g. prepare_for_tts(generation.answer, generation.is_refusal) → TTS text
	h. Update chatbot + citations + history state + TTS state
	```

	### State Management

	- `gr.State([])` — conversation history as `list[tuple[str, str]]`
	- `gr.State("")` — last answer text (for TTS playback)

	Conversation history is passed to `AnswerChain._build_messages()` as proper `HumanMessage`/`AIMessage` pairs — the correct LangChain pattern for multi-turn conversation.

	### Error Handling

	Every failure path (no query, no KB selected, pipeline error) produces a user-visible error message in the chatbot rather than crashing. The query logger failure is non-critical (caught and warned, never raises).

	### Factory Functions

	Event handlers are returned as closures from factory functions:

	```python
	def _make_transcribe_fn(transcriber):
	def _transcribe(audio_path): ...
	return _transcribe

	def _make_query_fn(answer_chain, db_path):
	def _query(query, kb_names, history, chatbot): ...
	return _query
	```

	This enables dependency injection without globals — the `transcriber` and `answer_chain` objects are passed in from `app.py` and captured in the closure.

	---

	## KB Tab (Management UI)

	File: [ui/tabs/kb_tab.py](../ui/tabs/kb_tab.py)

	Three operations wired to Gradio event handlers:

	\| Button \| Handler \| Output \|
	\|--------\|---------\|--------\|
	\| ➕ Create KB \| `_create_kb()` \| Status message, refreshed dropdowns \|
	\| 📤 Index Documents \| `_upload_docs()` \| Ingestion report per file \|
	\| 🗑️ Delete KB \| `_delete_kb()` \| Status message, refreshed table + dropdowns \|

	After each create/delete, all dropdowns and the KB dataframe are updated via `gr.update(choices=...)` — no page refresh needed.

	---

	## Analytics Tab

	File: [ui/tabs/analytics_tab.py](../ui/tabs/analytics_tab.py)

	Pulls data from `sqlite_store.get_query_stats()` on refresh button click:

	\| Metric \| Source \|
	\|--------\|--------\|
	\| Total queries (7d) \| `COUNT(*)` from `query_log` \|
	\| Avg end-to-end latency \| `AVG(total_latency_ms)` \|
	\| Avg citations per answer \| `AVG(citation_count)` \|
	\| Queries by day \| `GROUP BY DATE(timestamp)` \|
	\| KB inventory \| `KBManager.list_kbs()` \|

	Stats are not loaded on page load — the user clicks 🔄 Refresh to pull fresh data. This avoids unnecessary DB queries at startup.

	---

	## app.py — Startup Orchestration

	File: [app.py](../app.py)

	```python
	_startup() → (kb_manager, transcriber, answer_chain):
	1. cfg.ensure_directories()
	2. KBManager(db_path=data_dir/voicevault.db) ← initializes SQLite schema
	3. WhisperTranscriber() ← lazy: no model loaded at startup
	4. AnswerChain() ← lazy: LLM clients created per call
	```

	All three singletons are created once and passed to the UI tab builders. This avoids the model-loading overhead being repeated on every query.

	---

	## Security Decisions

	### Password Storage
	bcrypt with work factor 12 — prevents offline brute-force attacks even if the SQLite file is exfiltrated. The same rounds as industry standard (bcrypt rounds ≥ 10 is OWASP recommended).

	### KB Name as Path Component
	The KB name regex (`^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$`) prevents path traversal. All KB filesystem operations use `cfg.kb_dir(kb_name)` which returns `data_dir / kb_name` — impossible to escape with a validated slug.

	### Query Audit Log — PII Protection
	The raw query text is NEVER stored in SQLite. Only the SHA-256 hash of the query is stored (`voice_query_hash`). This satisfies GDPR "data minimization" — analytics work on aggregates, not raw user queries.

	### No Globals in Event Handlers
	All pipeline objects (transcriber, answer_chain, kb_manager) are passed via closures, not module-level globals. This makes the code testable (dependency injection) and prevents accidental shared state mutation.

	---

	## Test Coverage

	File: [tests/test_phase5.py](../tests/test_phase5.py) \| 55/55 passed

	\| Class \| Tests \| What's verified \|
	\|-------\|-------\|----------------\|
	\| `TestKBManagerCreate` \| 16 \| Create, list, get, duplicate detection, 5 slug validation cases \|
	\| `TestKBManagerDelete` \| 3 \| Delete removes from list, nonexistent raises, count decreases \|
	\| `TestKBManagerPassword` \| 7 \| Public access, protected access, wrong pass, no pass, unknown KB, bcrypt format \|
	\| `TestKBManagerStats` \| 3 \| Returns dict, has required keys, zeros on empty DB \|
	\| `TestPreparForTTS` \| 7 \| Citation stripping, refusal → empty, normal text unchanged, no double spaces \|
	\| `TestCitationPanel` \| 8 \| Filename, page, section, excerpt, multiple, numbered, empty, type \|
	\| `TestUIHelpers` \| 7 \| KB choices, KB table format, protected lock icon, append_chat, no mutation \|
	\| `TestAppStartup` \| 4 \| build_app returns Blocks, all three tab builders run without error \|

	### Fixture Design

	The `manager` fixture creates a fresh KBManager backed by a temp SQLite path for each test — complete isolation with no shared state between tests.

	---

	## Full Project Test Summary

	\| Phase \| Tests \| Status \|
	\|-------\|-------\|--------\|
	\| Phase 0 — Foundation \| 58 passed \| ✅ \|
	\| Phase 1 — Ingestion \| 46 passed \| ✅ \|
	\| Phase 2 — Retrieval \| 33 passed, 0 errors \| ✅ \|
	\| Phase 3 — ASR \| 45 passed, 2 skipped (soundfile) \| ✅ \|
	\| Phase 4 — Generation \| 72 passed \| ✅ \|
	\| Phase 5 — UI & Access \| 55 passed \| ✅ \|
	\| Total \| 309 passed, 2 skipped \| ✅ \|

	Note on conftest.py CPU fix: `CUDA_VISIBLE_DEVICES="-1"` is set in `tests/conftest.py` to force CPU for all tests. This prevents CUDA compatibility errors on RTX 5070 (sm_120 not supported by packaged PyTorch ≤ 2.x). Production deployment on HuggingFace Spaces uses NVIDIA T4 (sm_75) which is fully compatible.