VoiceVault / DOCS /phase5_ui_access.md
NinjainPJs's picture
Initial release: VoiceVault v1.0.0 β€” Voice-First RAG Knowledge Agent
85f900d

Phase 5 β€” Full UI, TTS & Access Control

Status: βœ… Complete | Tests: 55/55 passed | Files: 7 modules (3 UI tabs, 2 backend, 1 TTS, 1 updated app.py)


What Was Built

Phase 5 wires all previous phases into a working end-to-end application.

Module Responsibility
voicevault/kb/kb_manager.py KB lifecycle: create, list, delete, ingest, password auth
voicevault/tts/web_speech.py TTS text prep: strip citation markers before speech
ui/tabs/ask_tab.py Full voice query pipeline in Gradio
ui/tabs/kb_tab.py KB creation, document upload, management
ui/tabs/analytics_tab.py Query stats from SQLite audit log
ui/tabs/settings_tab.py Configuration panels (display-only)
app.py Startup orchestration, pipeline wiring

KBManager

File: voicevault/kb/kb_manager.py

Central Database

All KBs share one SQLite database at cfg.data_dir / "voicevault.db". This enables cross-KB queries, global analytics, and efficient listing without per-KB filesystem scanning.

KB Name Validation

_VALID_KB_NAME = re.compile(r"^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$|^[a-z0-9]$")
  • Lowercase alphanumeric + hyphens only
  • 1–64 characters
  • Cannot start or end with a hyphen
  • Prevents path traversal attacks (no .., /, \, spaces)

Password Protection (bcrypt)

password_hash = bcrypt.hashpw(
    password.encode(), bcrypt.gensalt(rounds=cfg.bcrypt_rounds)  # default: 12
).decode()
  • Passwords are hashed at creation time β€” plaintext never stored
  • verify_password() uses bcrypt.checkpw() for constant-time comparison
  • Public KBs (no password) return True for any password check

verify_password Logic

KB has no hash (public)  β†’ True  (always accessible)
KB has hash, no password β†’ False (protected but no credentials)
KB has hash, with password β†’ bcrypt.checkpw(password, hash)

ingest_documents Flow

ingest_documents(kb_name, file_paths, password=None):
    1. Verify KB exists
    2. Verify password
    3. IndexBuilder(kb_name).ingest_file(path, db_path) per file
    4. Return list[IngestionReport]

Delegates entirely to IndexBuilder (Phase 1) which handles parsing, chunking, embedding, ChromaDB upsert, BM25 rebuild, and deduplication.

delete_kb Flow

delete_kb(kb_name):
    1. Verify KB exists (raises KBManagerError if not)
    2. db.delete_kb() β†’ SQLite CASCADE deletes documents, chunks, query_log
    3. shutil.rmtree(cfg.kb_dir(kb_name)) β†’ removes ChromaDB, BM25, files

Irreversible β€” the UI confirms before calling.


TTS β€” Web Speech API

File: voicevault/tts/web_speech.py

The TTS engine runs entirely in the browser via the SpeechSynthesis API β€” zero API cost, zero server load. Python's role is text preparation only.

prepare_for_tts()

def prepare_for_tts(answer: str, is_refusal: bool = False) -> str:
    if is_refusal or not answer:
        return ""
    text = _CITATION_MARKER_RE.sub("", answer)  # strip [Source: ...]
    text = re.sub(r"\s{2,}", " ", text).strip()
    return text

Removes [Source: filename, p.N] markers before passing to the browser β€” reading "Source: paper dot pdf, p dot 3" aloud is poor UX. The JS bridge (ui/components/audio_controls.py) takes this cleaned text and calls window._vv_tts.speak(text, rate, pitch).


Ask Tab (Full Pipeline)

File: ui/tabs/ask_tab.py

End-to-End Query Flow

1. User records audio β†’ stop_recording event fires
   β†’ WhisperTranscriber.transcribe(audio_path) β†’ transcript text

2. User selects KB(s) β†’ clicks Ask

3. _query_fn():
   a. QueryPreprocessor.process(query) β†’ pq (cleaned, typed)
   b. HybridRetriever(kb_names=selected).search(pq.processed_query) β†’ results
   c. ContextBuilder().build(results) β†’ (context_str, citation_map)
   d. AnswerChain.generate(query, context, citation_map, history, query_type) β†’ generation
   e. db.log_query(...)  ← SHA-256 only, no raw text stored
   f. format_citations_markdown(generation.citations) β†’ citation panel
   g. prepare_for_tts(generation.answer, generation.is_refusal) β†’ TTS text
   h. Update chatbot + citations + history state + TTS state

State Management

  • gr.State([]) β€” conversation history as list[tuple[str, str]]
  • gr.State("") β€” last answer text (for TTS playback)

Conversation history is passed to AnswerChain._build_messages() as proper HumanMessage/AIMessage pairs β€” the correct LangChain pattern for multi-turn conversation.

Error Handling

Every failure path (no query, no KB selected, pipeline error) produces a user-visible error message in the chatbot rather than crashing. The query logger failure is non-critical (caught and warned, never raises).

Factory Functions

Event handlers are returned as closures from factory functions:

def _make_transcribe_fn(transcriber):
    def _transcribe(audio_path): ...
    return _transcribe

def _make_query_fn(answer_chain, db_path):
    def _query(query, kb_names, history, chatbot): ...
    return _query

This enables dependency injection without globals β€” the transcriber and answer_chain objects are passed in from app.py and captured in the closure.


KB Tab (Management UI)

File: ui/tabs/kb_tab.py

Three operations wired to Gradio event handlers:

Button Handler Output
βž• Create KB _create_kb() Status message, refreshed dropdowns
πŸ“€ Index Documents _upload_docs() Ingestion report per file
πŸ—‘οΈ Delete KB _delete_kb() Status message, refreshed table + dropdowns

After each create/delete, all dropdowns and the KB dataframe are updated via gr.update(choices=...) β€” no page refresh needed.


Analytics Tab

File: ui/tabs/analytics_tab.py

Pulls data from sqlite_store.get_query_stats() on refresh button click:

Metric Source
Total queries (7d) COUNT(*) from query_log
Avg end-to-end latency AVG(total_latency_ms)
Avg citations per answer AVG(citation_count)
Queries by day GROUP BY DATE(timestamp)
KB inventory KBManager.list_kbs()

Stats are not loaded on page load β€” the user clicks πŸ”„ Refresh to pull fresh data. This avoids unnecessary DB queries at startup.


app.py β€” Startup Orchestration

File: app.py

_startup() β†’ (kb_manager, transcriber, answer_chain):
    1. cfg.ensure_directories()
    2. KBManager(db_path=data_dir/voicevault.db)  ← initializes SQLite schema
    3. WhisperTranscriber()  ← lazy: no model loaded at startup
    4. AnswerChain()         ← lazy: LLM clients created per call

All three singletons are created once and passed to the UI tab builders. This avoids the model-loading overhead being repeated on every query.


Security Decisions

Password Storage

bcrypt with work factor 12 β€” prevents offline brute-force attacks even if the SQLite file is exfiltrated. The same rounds as industry standard (bcrypt rounds β‰₯ 10 is OWASP recommended).

KB Name as Path Component

The KB name regex (^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$) prevents path traversal. All KB filesystem operations use cfg.kb_dir(kb_name) which returns data_dir / kb_name β€” impossible to escape with a validated slug.

Query Audit Log β€” PII Protection

The raw query text is NEVER stored in SQLite. Only the SHA-256 hash of the query is stored (voice_query_hash). This satisfies GDPR "data minimization" β€” analytics work on aggregates, not raw user queries.

No Globals in Event Handlers

All pipeline objects (transcriber, answer_chain, kb_manager) are passed via closures, not module-level globals. This makes the code testable (dependency injection) and prevents accidental shared state mutation.


Test Coverage

File: tests/test_phase5.py | 55/55 passed

Class Tests What's verified
TestKBManagerCreate 16 Create, list, get, duplicate detection, 5 slug validation cases
TestKBManagerDelete 3 Delete removes from list, nonexistent raises, count decreases
TestKBManagerPassword 7 Public access, protected access, wrong pass, no pass, unknown KB, bcrypt format
TestKBManagerStats 3 Returns dict, has required keys, zeros on empty DB
TestPreparForTTS 7 Citation stripping, refusal β†’ empty, normal text unchanged, no double spaces
TestCitationPanel 8 Filename, page, section, excerpt, multiple, numbered, empty, type
TestUIHelpers 7 KB choices, KB table format, protected lock icon, append_chat, no mutation
TestAppStartup 4 build_app returns Blocks, all three tab builders run without error

Fixture Design

The manager fixture creates a fresh KBManager backed by a temp SQLite path for each test β€” complete isolation with no shared state between tests.


Full Project Test Summary

Phase Tests Status
Phase 0 β€” Foundation 58 passed βœ…
Phase 1 β€” Ingestion 46 passed βœ…
Phase 2 β€” Retrieval 33 passed, 0 errors βœ…
Phase 3 β€” ASR 45 passed, 2 skipped (soundfile) βœ…
Phase 4 β€” Generation 72 passed βœ…
Phase 5 β€” UI & Access 55 passed βœ…
Total 309 passed, 2 skipped βœ…

Note on conftest.py CPU fix: CUDA_VISIBLE_DEVICES="-1" is set in tests/conftest.py to force CPU for all tests. This prevents CUDA compatibility errors on RTX 5070 (sm_120 not supported by packaged PyTorch ≀ 2.x). Production deployment on HuggingFace Spaces uses NVIDIA T4 (sm_75) which is fully compatible.