Spaces:

NinjainPJs
/

VoiceVault

Running

App Files Files Community

VoiceVault / DOCS /phase5_ui_access.md

NinjainPJs

Initial release: VoiceVault v1.0.0 — Voice-First RAG Knowledge Agent

85f900d 3 months ago

preview code

raw

history blame contribute delete

9.96 kB

Phase 5 — Full UI, TTS & Access Control

Status: ✅ Complete | Tests: 55/55 passed | Files: 7 modules (3 UI tabs, 2 backend, 1 TTS, 1 updated app.py)

What Was Built

Phase 5 wires all previous phases into a working end-to-end application.

Module	Responsibility
`voicevault/kb/kb_manager.py`	KB lifecycle: create, list, delete, ingest, password auth
`voicevault/tts/web_speech.py`	TTS text prep: strip citation markers before speech
`ui/tabs/ask_tab.py`	Full voice query pipeline in Gradio
`ui/tabs/kb_tab.py`	KB creation, document upload, management
`ui/tabs/analytics_tab.py`	Query stats from SQLite audit log
`ui/tabs/settings_tab.py`	Configuration panels (display-only)
`app.py`	Startup orchestration, pipeline wiring

KBManager

File: voicevault/kb/kb_manager.py

Central Database

All KBs share one SQLite database at cfg.data_dir / "voicevault.db". This enables cross-KB queries, global analytics, and efficient listing without per-KB filesystem scanning.

KB Name Validation

_VALID_KB_NAME = re.compile(r"^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$|^[a-z0-9]$")

Lowercase alphanumeric + hyphens only
1–64 characters
Cannot start or end with a hyphen
Prevents path traversal attacks (no .., /, \, spaces)

Password Protection (bcrypt)

password_hash = bcrypt.hashpw(
    password.encode(), bcrypt.gensalt(rounds=cfg.bcrypt_rounds)  # default: 12
).decode()

Passwords are hashed at creation time — plaintext never stored
verify_password() uses bcrypt.checkpw() for constant-time comparison
Public KBs (no password) return True for any password check

verify_password Logic

KB has no hash (public)  → True  (always accessible)
KB has hash, no password → False (protected but no credentials)
KB has hash, with password → bcrypt.checkpw(password, hash)

ingest_documents Flow

ingest_documents(kb_name, file_paths, password=None):
    1. Verify KB exists
    2. Verify password
    3. IndexBuilder(kb_name).ingest_file(path, db_path) per file
    4. Return list[IngestionReport]

Delegates entirely to IndexBuilder (Phase 1) which handles parsing, chunking, embedding, ChromaDB upsert, BM25 rebuild, and deduplication.

delete_kb Flow

delete_kb(kb_name):
    1. Verify KB exists (raises KBManagerError if not)
    2. db.delete_kb() → SQLite CASCADE deletes documents, chunks, query_log
    3. shutil.rmtree(cfg.kb_dir(kb_name)) → removes ChromaDB, BM25, files

Irreversible — the UI confirms before calling.

TTS — Web Speech API

File: voicevault/tts/web_speech.py

The TTS engine runs entirely in the browser via the SpeechSynthesis API — zero API cost, zero server load. Python's role is text preparation only.

prepare_for_tts()

def prepare_for_tts(answer: str, is_refusal: bool = False) -> str:
    if is_refusal or not answer:
        return ""
    text = _CITATION_MARKER_RE.sub("", answer)  # strip [Source: ...]
    text = re.sub(r"\s{2,}", " ", text).strip()
    return text

Removes [Source: filename, p.N] markers before passing to the browser — reading "Source: paper dot pdf, p dot 3" aloud is poor UX. The JS bridge (ui/components/audio_controls.py) takes this cleaned text and calls window._vv_tts.speak(text, rate, pitch).

Ask Tab (Full Pipeline)

File: ui/tabs/ask_tab.py

End-to-End Query Flow

1. User records audio → stop_recording event fires
   → WhisperTranscriber.transcribe(audio_path) → transcript text

2. User selects KB(s) → clicks Ask

3. _query_fn():
   a. QueryPreprocessor.process(query) → pq (cleaned, typed)
   b. HybridRetriever(kb_names=selected).search(pq.processed_query) → results
   c. ContextBuilder().build(results) → (context_str, citation_map)
   d. AnswerChain.generate(query, context, citation_map, history, query_type) → generation
   e. db.log_query(...)  ← SHA-256 only, no raw text stored
   f. format_citations_markdown(generation.citations) → citation panel
   g. prepare_for_tts(generation.answer, generation.is_refusal) → TTS text
   h. Update chatbot + citations + history state + TTS state

State Management

gr.State([]) — conversation history as list[tuple[str, str]]
gr.State("") — last answer text (for TTS playback)

Conversation history is passed to AnswerChain._build_messages() as proper HumanMessage/AIMessage pairs — the correct LangChain pattern for multi-turn conversation.

Error Handling

Every failure path (no query, no KB selected, pipeline error) produces a user-visible error message in the chatbot rather than crashing. The query logger failure is non-critical (caught and warned, never raises).

Factory Functions

Event handlers are returned as closures from factory functions:

def _make_transcribe_fn(transcriber):
    def _transcribe(audio_path): ...
    return _transcribe

def _make_query_fn(answer_chain, db_path):
    def _query(query, kb_names, history, chatbot): ...
    return _query

This enables dependency injection without globals — the transcriber and answer_chain objects are passed in from app.py and captured in the closure.

KB Tab (Management UI)

File: ui/tabs/kb_tab.py

Three operations wired to Gradio event handlers:

Button	Handler	Output
➕ Create KB	`_create_kb()`	Status message, refreshed dropdowns
📤 Index Documents	`_upload_docs()`	Ingestion report per file
🗑️ Delete KB	`_delete_kb()`	Status message, refreshed table + dropdowns

After each create/delete, all dropdowns and the KB dataframe are updated via gr.update(choices=...) — no page refresh needed.

Analytics Tab

File: ui/tabs/analytics_tab.py

Pulls data from sqlite_store.get_query_stats() on refresh button click:

Metric	Source
Total queries (7d)	`COUNT(*)` from `query_log`
Avg end-to-end latency	`AVG(total_latency_ms)`
Avg citations per answer	`AVG(citation_count)`
Queries by day	`GROUP BY DATE(timestamp)`
KB inventory	`KBManager.list_kbs()`

Stats are not loaded on page load — the user clicks 🔄 Refresh to pull fresh data. This avoids unnecessary DB queries at startup.

app.py — Startup Orchestration

File: app.py

_startup() → (kb_manager, transcriber, answer_chain):
    1. cfg.ensure_directories()
    2. KBManager(db_path=data_dir/voicevault.db)  ← initializes SQLite schema
    3. WhisperTranscriber()  ← lazy: no model loaded at startup
    4. AnswerChain()         ← lazy: LLM clients created per call

All three singletons are created once and passed to the UI tab builders. This avoids the model-loading overhead being repeated on every query.

Security Decisions

Password Storage

bcrypt with work factor 12 — prevents offline brute-force attacks even if the SQLite file is exfiltrated. The same rounds as industry standard (bcrypt rounds ≥ 10 is OWASP recommended).

KB Name as Path Component

The KB name regex (^[a-z0-9][a-z0-9\-]{0,62}[a-z0-9]$) prevents path traversal. All KB filesystem operations use cfg.kb_dir(kb_name) which returns data_dir / kb_name — impossible to escape with a validated slug.

Query Audit Log — PII Protection

The raw query text is NEVER stored in SQLite. Only the SHA-256 hash of the query is stored (voice_query_hash). This satisfies GDPR "data minimization" — analytics work on aggregates, not raw user queries.

No Globals in Event Handlers

All pipeline objects (transcriber, answer_chain, kb_manager) are passed via closures, not module-level globals. This makes the code testable (dependency injection) and prevents accidental shared state mutation.

Test Coverage

File: tests/test_phase5.py | 55/55 passed

Class	Tests	What's verified
`TestKBManagerCreate`	16	Create, list, get, duplicate detection, 5 slug validation cases
`TestKBManagerDelete`	3	Delete removes from list, nonexistent raises, count decreases
`TestKBManagerPassword`	7	Public access, protected access, wrong pass, no pass, unknown KB, bcrypt format
`TestKBManagerStats`	3	Returns dict, has required keys, zeros on empty DB
`TestPreparForTTS`	7	Citation stripping, refusal → empty, normal text unchanged, no double spaces
`TestCitationPanel`	8	Filename, page, section, excerpt, multiple, numbered, empty, type
`TestUIHelpers`	7	KB choices, KB table format, protected lock icon, append_chat, no mutation
`TestAppStartup`	4	build_app returns Blocks, all three tab builders run without error

Fixture Design

The manager fixture creates a fresh KBManager backed by a temp SQLite path for each test — complete isolation with no shared state between tests.

Full Project Test Summary

Phase	Tests	Status
Phase 0 — Foundation	58 passed	✅
Phase 1 — Ingestion	46 passed	✅
Phase 2 — Retrieval	33 passed, 0 errors	✅
Phase 3 — ASR	45 passed, 2 skipped (soundfile)	✅
Phase 4 — Generation	72 passed	✅
Phase 5 — UI & Access	55 passed	✅
Total	309 passed, 2 skipped	✅

Note on conftest.py CPU fix: CUDA_VISIBLE_DEVICES="-1" is set in tests/conftest.py to force CPU for all tests. This prevents CUDA compatibility errors on RTX 5070 (sm_120 not supported by packaged PyTorch ≤ 2.x). Production deployment on HuggingFace Spaces uses NVIDIA T4 (sm_75) which is fully compatible.