Spaces:
Running
title: VoiceVault
emoji: ποΈ
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: other
Overview
VoiceVault is a production-grade, voice-first Retrieval-Augmented Generation (RAG) system built entirely from scratch. It enables users to record or type questions and receive answers grounded in their own private document collections β with inline citations pointing back to the exact source, page, and paragraph.
The project was built in 6 phases over several weeks, with a full test suite (328 tests), enterprise-grade security practices (bcrypt, parameterized SQL, SHA-256 audit logs, SSRF prevention), and deployment to Hugging Face Spaces via Docker.
What makes this different from typical RAG demos:
- Hybrid retrieval β BM25 keyword search + semantic vector search, fused with Reciprocal Rank Fusion (RRF) + cross-encoder reranking. Most tutorials use only one retrieval method.
- Voice-native pipeline β Groq Whisper API for ~300ms cloud transcription with local Whisper fallback; Web Speech API for TTS output.
- Faithfulness guard β Detects when the LLM cannot answer from retrieved context and returns a grounded refusal instead of hallucinating.
- Multi-KB support β Multiple independent knowledge bases, each optionally password-protected.
Screenshots
Ask VoiceVault β Voice Query Interface
Record your question via microphone or type it. The mic button pulses when recording.
Knowledge Base Management
Create named knowledge bases, upload documents (PDF, DOCX, HTML, MD, TXT), and manage them.
Analytics Dashboard
Real-time query statistics: total queries, average latency, citation counts, and daily breakdowns.
Full App in Action
A populated knowledge base (358 chunks from 1 document) and a live conversation with the RAG pipeline.
Architecture
INGESTION PATH (one-time per document set)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User uploads PDF / HTML / DOCX / MD / TXT
β
βΌ
DocumentParser β text + metadata per page
β (PyMuPDF, BS4, python-docx)
βΌ
SemanticChunker β sentence-aware chunks
β (spaCy sentences + cosine boundary)
βΌ
IndexBuilder β ChromaDB (vector) + BM25 (keyword)
+ SQLite (metadata)
QUERY PATH (real-time, per question)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Browser mic β WAV β POST /api/transcribe
β
βΌ
GroqTranscriber β Groq Whisper API (~300ms)
β [fallback: local Whisper CPU]
βΌ
QueryPreprocessor β filler removal, intent classification
β (factual / summary / compare)
βΌ
HybridRetriever β BM25 top-20 + Vector top-20
β β RRF merge (k=60)
β β CrossEncoder rerank (ms-marco-MiniLM-L12-v2)
β β diversity filter (max 2 chunks/page)
βΌ
ContextBuilder β formatted context with [Source:N] markers
βΌ
LangChain LCEL β Groq Llama-3.1-70B (primary)
β [fallback: Gemini 1.5 Flash]
βΌ
FaithfulnessGuard β refusal detection, confidence scoring
β
CitationInjector β resolve [Source:N] β filename + page
βΌ
JSON response β answer + citations + confidence + tts_text
β
βΌ
SPA Frontend β chat display + Web Speech API TTS
Features
| Feature | Detail |
|---|---|
| Voice Input | Browser microphone β WAV conversion β Groq Whisper API (~300ms) |
| Hybrid Retrieval | BM25 + semantic vector search, RRF fusion, cross-encoder reranking |
| Multi-KB | Create multiple independent knowledge bases per session |
| KB Access Control | Optional bcrypt password protection (work factor 12) per KB |
| Document Formats | PDF, DOCX, HTML, Markdown, TXT (OCR fallback for scanned PDFs) |
| Source Citations | Every answer traceable to source file + page number |
| Faithfulness Guard | Detects hallucinations; returns grounded refusal when context is insufficient |
| Conversation Memory | Rolling 5-turn conversation window passed to the LLM |
| LLM Fallback | Groq Llama-3.1-70B β Gemini 1.5 Flash automatic fallback |
| TTS Output | Web Speech API reads answer aloud with citation markers stripped |
| Analytics | SQLite audit log: query counts, latency, citation rates (7-day window) |
| Privacy | Raw queries never stored β SHA-256 hash only in audit log |
| 328 Tests | Integration + unit tests across all 6 phases |
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| API | FastAPI + uvicorn | REST backend with async endpoints |
| Frontend | HTML5 / CSS3 / Vanilla JS | Premium dark SPA (no framework) |
| ASR | Groq Whisper API | Cloud transcription (~300ms) |
| ASR Fallback | OpenAI Whisper Large-v3 | Local CPU transcription |
| Embeddings | sentence-transformers all-MiniLM-L6-v2 |
Dense vector representations |
| Reranking | cross-encoder/ms-marco-MiniLM-L12-v2 |
Semantic relevance scoring |
| Vector Store | ChromaDB | In-process vector database |
| Keyword Search | rank-bm25 (BM25Okapi) | Lexical keyword matching |
| Chunking | spaCy en_core_web_sm |
Sentence boundary detection |
| LLM (primary) | Groq Llama-3.1-70B | Fast inference via Groq cloud |
| LLM (fallback) | Gemini 1.5 Flash | Google generative AI fallback |
| Orchestration | LangChain LCEL | LLM pipeline composition |
| Metadata | SQLite | KB registry, doc index, audit log |
| Security | bcrypt (work factor 12) | KB password hashing |
| Config | Pydantic-settings | Centralized, type-safe config |
| Deployment | Docker on Hugging Face Spaces | Container-based cloud hosting |
Project Structure
Project-VoiceVault/
βββ server.py # FastAPI entry point (run this)
βββ app.py # Gradio entry point (legacy / tests)
βββ config.py # Centralized Pydantic-settings config
βββ requirements.txt # All dependencies
βββ Dockerfile # HF Spaces Docker deployment
βββ .env.example # Environment variable template
β
βββ api/ # FastAPI REST API
β βββ __init__.py
β βββ routes.py # All /api/* endpoints
β
βββ static/ # SPA frontend assets
β βββ index.html # Single-page application shell
β βββ style.css # Dark glassmorphism design system
β βββ app.js # Full SPA logic (recording, chat, KB CRUD)
β
βββ voicevault/ # Core package
β βββ models.py # Pydantic data models
β βββ asr/
β β βββ groq_transcriber.py # Groq Whisper cloud ASR (~300ms)
β β βββ whisper_transcriber.py # Local Whisper CPU/GPU fallback
β β βββ query_preprocessor.py # Filler removal, intent classification
β βββ ingestion/
β β βββ document_parser.py # PDF/HTML/DOCX/MD/TXT β structured text
β β βββ semantic_chunker.py # Sentence-aware chunking with topic boundaries
β β βββ index_builder.py # ChromaDB + BM25 + SQLite orchestration
β βββ retrieval/
β β βββ hybrid_retriever.py # BM25 + vector + RRF + cross-encoder
β β βββ bm25_retriever.py # BM25Okapi keyword search
β β βββ vector_retriever.py # ChromaDB semantic search
β β βββ context_builder.py # Context formatting + citation markers
β βββ generation/
β β βββ answer_chain.py # LangChain LCEL + Groq + Gemini fallback
β β βββ faithfulness_guard.py # Hallucination detection + refusal
β β βββ citation_injector.py # [Source:N] β filename + page resolution
β βββ kb/
β β βββ kb_manager.py # KB lifecycle, bcrypt auth, validation
β βββ storage/
β β βββ sqlite_store.py # Schema, CRUD, audit log queries
β β βββ chroma_store.py # ChromaDB wrapper
β βββ tts/
β βββ web_speech.py # TTS text preparation
β
βββ ui/ # Gradio UI components (legacy / app.py)
β βββ tabs/
β β βββ ask_tab.py
β β βββ kb_tab.py
β β βββ analytics_tab.py
β β βββ settings_tab.py
β βββ components/
β βββ citation_panel.py
β βββ audio_controls.py
β
βββ tests/ # Full test suite β 328 tests
β βββ conftest.py
β βββ test_api_routes.py # Integration tests (FastAPI + real methods)
β βββ test_phase0.py # Foundation tests
β βββ test_phase1.py # Ingestion tests
β βββ test_phase2.py # Retrieval tests
β βββ test_phase3.py # ASR tests
β βββ test_phase4.py # Generation tests
β βββ test_phase5.py # UI / access control tests
β
βββ DOCS/ # Detailed phase documentation
β βββ phase0_foundation.md
β βββ phase1_ingestion.md
β βββ phase2_retrieval.md
β βββ phase3_asr.md
β βββ phase4_generation.md
β βββ phase5_ui_access.md
β βββ phase6_deployment.md
β
βββ Screenshots/
βββ 1.png # Ask tab β voice query interface
βββ 2.png # Knowledge Bases panel
βββ 3.png # Analytics dashboard
βββ 4.png # Full app with KB and live conversation
Quick Start
Prerequisites
- Python 3.11+
- A Groq API key (free at console.groq.com)
- Optionally a Gemini API key (free at aistudio.google.com)
1. Clone and install
git clone https://github.com/ninjacode911/Project-VoiceVault.git
cd Project-VoiceVault
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install torch --index-url https://download.pytorch.org/whl/cpu # CPU-only (saves ~1.8GB)
pip install -r requirements.txt
python -m spacy download en_core_web_sm
2. Configure secrets
cp .env.example .env
# Edit .env and add:
# GROQ_API_KEY=gsk_...
# GEMINI_API_KEY=... (optional)
3. Run
python server.py
# Open http://localhost:7860
4. Use it
- Navigate to Knowledge Bases β click + New Knowledge Base
- Name it (lowercase, hyphens only, e.g.
my-docs) and upload your PDFs/documents - Go back to Ask VoiceVault β select your KB β record or type a question β click Ask
Running Tests
pytest tests/ -v
# Expected: 328 passed
The integration tests in tests/test_api_routes.py use a real KBManager backed by a temp SQLite DB and exercise the actual FastAPI routes and method signatures β not mocked pipelines. This is intentional: it catches runtime AttributeError bugs that pure-mock unit tests miss.
Deployment to Hugging Face Spaces
The project ships with a Dockerfile configured for HF Spaces. The Docker image:
- Uses Python 3.11-slim base
- Installs CPU-only PyTorch (~650MB vs 2.5GB GPU wheels)
- Pre-downloads
all-MiniLM-L6-v2andcross-encoder/ms-marco-MiniLM-L12-v2at build time (no cold-start model downloads) - Downloads
en_core_web_smspaCy model at build time - Binds to
0.0.0.0:7860(HF Spaces default port)
To deploy your own copy:
- Create a Hugging Face Space with Docker SDK
- Push this repository to the Space's git remote
- Add
GROQ_API_KEY(and optionallyGEMINI_API_KEY) as Space secrets
See DOCS/phase6_deployment.md for the full deployment walkthrough.
Configuration
All configuration is environment-driven via .env. See .env.example for the full reference.
Key variables:
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
β | Required. Groq API key for Whisper + Llama |
GEMINI_API_KEY |
β | Optional Gemini fallback key |
HOST |
0.0.0.0 |
Server bind address |
PORT |
7860 |
Server port |
FINAL_TOP_K |
5 |
Number of chunks passed to LLM |
MAX_ANSWER_TOKENS |
500 |
LLM max output tokens |
CHUNK_SIZE_MAX |
600 |
Max tokens per document chunk |
BCRYPT_ROUNDS |
12 |
bcrypt work factor for KB passwords |
Security
| Control | Implementation |
|---|---|
| No raw queries stored | Audit log stores SHA-256 hash only |
| KB access control | bcrypt-hashed passwords (work factor 12) |
| SQL injection prevention | 100% parameterized queries β no f-string SQL |
| Path traversal prevention | KB names validated as slugs (^[a-z0-9][a-z0-9\-]*[a-z0-9]$) |
| SSRF prevention | URL ingestion via trafilatura with no internal-network access |
| Upload whitelist | Only .pdf, .html, .docx, .md, .txt accepted |
| File size limit | 50MB max per upload |
| GPU isolation | CUDA_VISIBLE_DEVICES=-1 prevents CUDA crashes on incompatible hardware |
| No secrets in git | .env gitignored; HF secrets via Space settings API |
Phase Documentation
Each phase has a detailed write-up covering design decisions, key code sections, and test results:
| Phase | Topic | Tests |
|---|---|---|
| Phase 0 | Project Foundation (config, models, schema, scaffold) | 58 β |
| Phase 1 | Document Ingestion (parser, chunker, indexer) | 46 β |
| Phase 2 | Hybrid Retrieval (BM25 + vector + RRF + reranker) | 33 β |
| Phase 3 | ASR & Voice Input (Whisper, query preprocessor) | 47 β |
| Phase 4 | Generation & Citations (LangChain, faithfulness guard) | 72 β |
| Phase 5 | Full UI, TTS & Access Control | 55 β |
| Phase 6 | FastAPI Server, SPA Frontend & HF Deployment | 17 β |
Total: 328 tests β all passing.
License
Source Available β All Rights Reserved. See LICENSE for full terms.
The source code is publicly visible for viewing and educational purposes. Any use in personal, commercial, or academic projects requires explicit written permission from the author.
To request permission: navnitamrutharaj1234@gmail.com
Author: Navnit Amrutharaj