Spaces:
Running
Running
| title: VoiceVault | |
| emoji: ποΈ | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: other | |
| <div align="center"> | |
| # VoiceVault | |
| **Voice-First RAG Knowledge Agent** | |
| *Speak to your documents. Get cited answers back.* | |
| [](https://www.python.org/) | |
| [](https://fastapi.tiangolo.com/) | |
| [](LICENSE) | |
| [](tests/) | |
| [](https://huggingface.co/spaces/NinjainPJs/VoiceVault) | |
| [**Live Demo β**](https://huggingface.co/spaces/NinjainPJs/VoiceVault) | [**Documentation β**](DOCS/) | [**Project Plan β**](PLAN.md) | |
| </div> | |
| --- | |
| ## Overview | |
| VoiceVault is a production-grade, voice-first Retrieval-Augmented Generation (RAG) system built entirely from scratch. It enables users to record or type questions and receive answers grounded in their own private document collections β with inline citations pointing back to the exact source, page, and paragraph. | |
| The project was built in 6 phases over several weeks, with a full test suite (328 tests), enterprise-grade security practices (bcrypt, parameterized SQL, SHA-256 audit logs, SSRF prevention), and deployment to Hugging Face Spaces via Docker. | |
| **What makes this different from typical RAG demos:** | |
| - **Hybrid retrieval** β BM25 keyword search + semantic vector search, fused with Reciprocal Rank Fusion (RRF) + cross-encoder reranking. Most tutorials use only one retrieval method. | |
| - **Voice-native pipeline** β Groq Whisper API for ~300ms cloud transcription with local Whisper fallback; Web Speech API for TTS output. | |
| - **Faithfulness guard** β Detects when the LLM cannot answer from retrieved context and returns a grounded refusal instead of hallucinating. | |
| - **Multi-KB support** β Multiple independent knowledge bases, each optionally password-protected. | |
| --- | |
| ## Screenshots | |
| <div align="center"> | |
| ### Ask VoiceVault β Voice Query Interface | |
| *Record your question via microphone or type it. The mic button pulses when recording.* | |
| <img src="Screenshots/1.png" alt="Ask VoiceVault β main voice query interface with dark glassmorphism UI" width="800"/> | |
| --- | |
| ### Knowledge Base Management | |
| *Create named knowledge bases, upload documents (PDF, DOCX, HTML, MD, TXT), and manage them.* | |
| <img src="Screenshots/2.png" alt="Knowledge Bases panel β empty state with New Knowledge Base button" width="800"/> | |
| --- | |
| ### Analytics Dashboard | |
| *Real-time query statistics: total queries, average latency, citation counts, and daily breakdowns.* | |
| <img src="Screenshots/3.png" alt="Analytics dashboard showing query statistics" width="800"/> | |
| --- | |
| ### Full App in Action | |
| *A populated knowledge base (358 chunks from 1 document) and a live conversation with the RAG pipeline.* | |
| <img src="Screenshots/4.png" alt="Full VoiceVault app with a knowledge base and active conversation" width="800"/> | |
| </div> | |
| --- | |
| ## Architecture | |
| ``` | |
| INGESTION PATH (one-time per document set) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| User uploads PDF / HTML / DOCX / MD / TXT | |
| β | |
| βΌ | |
| DocumentParser β text + metadata per page | |
| β (PyMuPDF, BS4, python-docx) | |
| βΌ | |
| SemanticChunker β sentence-aware chunks | |
| β (spaCy sentences + cosine boundary) | |
| βΌ | |
| IndexBuilder β ChromaDB (vector) + BM25 (keyword) | |
| + SQLite (metadata) | |
| QUERY PATH (real-time, per question) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Browser mic β WAV β POST /api/transcribe | |
| β | |
| βΌ | |
| GroqTranscriber β Groq Whisper API (~300ms) | |
| β [fallback: local Whisper CPU] | |
| βΌ | |
| QueryPreprocessor β filler removal, intent classification | |
| β (factual / summary / compare) | |
| βΌ | |
| HybridRetriever β BM25 top-20 + Vector top-20 | |
| β β RRF merge (k=60) | |
| β β CrossEncoder rerank (ms-marco-MiniLM-L12-v2) | |
| β β diversity filter (max 2 chunks/page) | |
| βΌ | |
| ContextBuilder β formatted context with [Source:N] markers | |
| βΌ | |
| LangChain LCEL β Groq Llama-3.1-70B (primary) | |
| β [fallback: Gemini 1.5 Flash] | |
| βΌ | |
| FaithfulnessGuard β refusal detection, confidence scoring | |
| β | |
| CitationInjector β resolve [Source:N] β filename + page | |
| βΌ | |
| JSON response β answer + citations + confidence + tts_text | |
| β | |
| βΌ | |
| SPA Frontend β chat display + Web Speech API TTS | |
| ``` | |
| --- | |
| ## Features | |
| | Feature | Detail | | |
| |---------|--------| | |
| | **Voice Input** | Browser microphone β WAV conversion β Groq Whisper API (~300ms) | | |
| | **Hybrid Retrieval** | BM25 + semantic vector search, RRF fusion, cross-encoder reranking | | |
| | **Multi-KB** | Create multiple independent knowledge bases per session | | |
| | **KB Access Control** | Optional bcrypt password protection (work factor 12) per KB | | |
| | **Document Formats** | PDF, DOCX, HTML, Markdown, TXT (OCR fallback for scanned PDFs) | | |
| | **Source Citations** | Every answer traceable to source file + page number | | |
| | **Faithfulness Guard** | Detects hallucinations; returns grounded refusal when context is insufficient | | |
| | **Conversation Memory** | Rolling 5-turn conversation window passed to the LLM | | |
| | **LLM Fallback** | Groq Llama-3.1-70B β Gemini 1.5 Flash automatic fallback | | |
| | **TTS Output** | Web Speech API reads answer aloud with citation markers stripped | | |
| | **Analytics** | SQLite audit log: query counts, latency, citation rates (7-day window) | | |
| | **Privacy** | Raw queries never stored β SHA-256 hash only in audit log | | |
| | **328 Tests** | Integration + unit tests across all 6 phases | | |
| --- | |
| ## Tech Stack | |
| | Layer | Technology | Purpose | | |
| |-------|-----------|---------| | |
| | **API** | FastAPI + uvicorn | REST backend with async endpoints | | |
| | **Frontend** | HTML5 / CSS3 / Vanilla JS | Premium dark SPA (no framework) | | |
| | **ASR** | Groq Whisper API | Cloud transcription (~300ms) | | |
| | **ASR Fallback** | OpenAI Whisper Large-v3 | Local CPU transcription | | |
| | **Embeddings** | sentence-transformers `all-MiniLM-L6-v2` | Dense vector representations | | |
| | **Reranking** | `cross-encoder/ms-marco-MiniLM-L12-v2` | Semantic relevance scoring | | |
| | **Vector Store** | ChromaDB | In-process vector database | | |
| | **Keyword Search** | rank-bm25 (BM25Okapi) | Lexical keyword matching | | |
| | **Chunking** | spaCy `en_core_web_sm` | Sentence boundary detection | | |
| | **LLM (primary)** | Groq Llama-3.1-70B | Fast inference via Groq cloud | | |
| | **LLM (fallback)** | Gemini 1.5 Flash | Google generative AI fallback | | |
| | **Orchestration** | LangChain LCEL | LLM pipeline composition | | |
| | **Metadata** | SQLite | KB registry, doc index, audit log | | |
| | **Security** | bcrypt (work factor 12) | KB password hashing | | |
| | **Config** | Pydantic-settings | Centralized, type-safe config | | |
| | **Deployment** | Docker on Hugging Face Spaces | Container-based cloud hosting | | |
| --- | |
| ## Project Structure | |
| ``` | |
| Project-VoiceVault/ | |
| βββ server.py # FastAPI entry point (run this) | |
| βββ app.py # Gradio entry point (legacy / tests) | |
| βββ config.py # Centralized Pydantic-settings config | |
| βββ requirements.txt # All dependencies | |
| βββ Dockerfile # HF Spaces Docker deployment | |
| βββ .env.example # Environment variable template | |
| β | |
| βββ api/ # FastAPI REST API | |
| β βββ __init__.py | |
| β βββ routes.py # All /api/* endpoints | |
| β | |
| βββ static/ # SPA frontend assets | |
| β βββ index.html # Single-page application shell | |
| β βββ style.css # Dark glassmorphism design system | |
| β βββ app.js # Full SPA logic (recording, chat, KB CRUD) | |
| β | |
| βββ voicevault/ # Core package | |
| β βββ models.py # Pydantic data models | |
| β βββ asr/ | |
| β β βββ groq_transcriber.py # Groq Whisper cloud ASR (~300ms) | |
| β β βββ whisper_transcriber.py # Local Whisper CPU/GPU fallback | |
| β β βββ query_preprocessor.py # Filler removal, intent classification | |
| β βββ ingestion/ | |
| β β βββ document_parser.py # PDF/HTML/DOCX/MD/TXT β structured text | |
| β β βββ semantic_chunker.py # Sentence-aware chunking with topic boundaries | |
| β β βββ index_builder.py # ChromaDB + BM25 + SQLite orchestration | |
| β βββ retrieval/ | |
| β β βββ hybrid_retriever.py # BM25 + vector + RRF + cross-encoder | |
| β β βββ bm25_retriever.py # BM25Okapi keyword search | |
| β β βββ vector_retriever.py # ChromaDB semantic search | |
| β β βββ context_builder.py # Context formatting + citation markers | |
| β βββ generation/ | |
| β β βββ answer_chain.py # LangChain LCEL + Groq + Gemini fallback | |
| β β βββ faithfulness_guard.py # Hallucination detection + refusal | |
| β β βββ citation_injector.py # [Source:N] β filename + page resolution | |
| β βββ kb/ | |
| β β βββ kb_manager.py # KB lifecycle, bcrypt auth, validation | |
| β βββ storage/ | |
| β β βββ sqlite_store.py # Schema, CRUD, audit log queries | |
| β β βββ chroma_store.py # ChromaDB wrapper | |
| β βββ tts/ | |
| β βββ web_speech.py # TTS text preparation | |
| β | |
| βββ ui/ # Gradio UI components (legacy / app.py) | |
| β βββ tabs/ | |
| β β βββ ask_tab.py | |
| β β βββ kb_tab.py | |
| β β βββ analytics_tab.py | |
| β β βββ settings_tab.py | |
| β βββ components/ | |
| β βββ citation_panel.py | |
| β βββ audio_controls.py | |
| β | |
| βββ tests/ # Full test suite β 328 tests | |
| β βββ conftest.py | |
| β βββ test_api_routes.py # Integration tests (FastAPI + real methods) | |
| β βββ test_phase0.py # Foundation tests | |
| β βββ test_phase1.py # Ingestion tests | |
| β βββ test_phase2.py # Retrieval tests | |
| β βββ test_phase3.py # ASR tests | |
| β βββ test_phase4.py # Generation tests | |
| β βββ test_phase5.py # UI / access control tests | |
| β | |
| βββ DOCS/ # Detailed phase documentation | |
| β βββ phase0_foundation.md | |
| β βββ phase1_ingestion.md | |
| β βββ phase2_retrieval.md | |
| β βββ phase3_asr.md | |
| β βββ phase4_generation.md | |
| β βββ phase5_ui_access.md | |
| β βββ phase6_deployment.md | |
| β | |
| βββ Screenshots/ | |
| βββ 1.png # Ask tab β voice query interface | |
| βββ 2.png # Knowledge Bases panel | |
| βββ 3.png # Analytics dashboard | |
| βββ 4.png # Full app with KB and live conversation | |
| ``` | |
| --- | |
| ## Quick Start | |
| ### Prerequisites | |
| - Python 3.11+ | |
| - A Groq API key ([free at console.groq.com](https://console.groq.com)) | |
| - Optionally a Gemini API key ([free at aistudio.google.com](https://aistudio.google.com)) | |
| ### 1. Clone and install | |
| ```bash | |
| git clone https://github.com/ninjacode911/Project-VoiceVault.git | |
| cd Project-VoiceVault | |
| python -m venv .venv | |
| source .venv/bin/activate # Windows: .venv\Scripts\activate | |
| pip install torch --index-url https://download.pytorch.org/whl/cpu # CPU-only (saves ~1.8GB) | |
| pip install -r requirements.txt | |
| python -m spacy download en_core_web_sm | |
| ``` | |
| ### 2. Configure secrets | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env and add: | |
| # GROQ_API_KEY=gsk_... | |
| # GEMINI_API_KEY=... (optional) | |
| ``` | |
| ### 3. Run | |
| ```bash | |
| python server.py | |
| # Open http://localhost:7860 | |
| ``` | |
| ### 4. Use it | |
| 1. Navigate to **Knowledge Bases** β click **+ New Knowledge Base** | |
| 2. Name it (lowercase, hyphens only, e.g. `my-docs`) and upload your PDFs/documents | |
| 3. Go back to **Ask VoiceVault** β select your KB β record or type a question β click **Ask** | |
| --- | |
| ## Running Tests | |
| ```bash | |
| pytest tests/ -v | |
| # Expected: 328 passed | |
| ``` | |
| The integration tests in `tests/test_api_routes.py` use a real `KBManager` backed by a temp SQLite DB and exercise the actual FastAPI routes and method signatures β not mocked pipelines. This is intentional: it catches runtime `AttributeError` bugs that pure-mock unit tests miss. | |
| --- | |
| ## Deployment to Hugging Face Spaces | |
| The project ships with a `Dockerfile` configured for HF Spaces. The Docker image: | |
| - Uses Python 3.11-slim base | |
| - Installs CPU-only PyTorch (~650MB vs 2.5GB GPU wheels) | |
| - Pre-downloads `all-MiniLM-L6-v2` and `cross-encoder/ms-marco-MiniLM-L12-v2` at build time (no cold-start model downloads) | |
| - Downloads `en_core_web_sm` spaCy model at build time | |
| - Binds to `0.0.0.0:7860` (HF Spaces default port) | |
| To deploy your own copy: | |
| 1. Create a [Hugging Face Space](https://huggingface.co/new-space) with **Docker** SDK | |
| 2. Push this repository to the Space's git remote | |
| 3. Add `GROQ_API_KEY` (and optionally `GEMINI_API_KEY`) as Space secrets | |
| See [DOCS/phase6_deployment.md](DOCS/phase6_deployment.md) for the full deployment walkthrough. | |
| --- | |
| ## Configuration | |
| All configuration is environment-driven via `.env`. See [`.env.example`](.env.example) for the full reference. | |
| Key variables: | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `GROQ_API_KEY` | β | **Required.** Groq API key for Whisper + Llama | | |
| | `GEMINI_API_KEY` | β | Optional Gemini fallback key | | |
| | `HOST` | `0.0.0.0` | Server bind address | | |
| | `PORT` | `7860` | Server port | | |
| | `FINAL_TOP_K` | `5` | Number of chunks passed to LLM | | |
| | `MAX_ANSWER_TOKENS` | `500` | LLM max output tokens | | |
| | `CHUNK_SIZE_MAX` | `600` | Max tokens per document chunk | | |
| | `BCRYPT_ROUNDS` | `12` | bcrypt work factor for KB passwords | | |
| --- | |
| ## Security | |
| | Control | Implementation | | |
| |---------|----------------| | |
| | **No raw queries stored** | Audit log stores SHA-256 hash only | | |
| | **KB access control** | bcrypt-hashed passwords (work factor 12) | | |
| | **SQL injection prevention** | 100% parameterized queries β no f-string SQL | | |
| | **Path traversal prevention** | KB names validated as slugs (`^[a-z0-9][a-z0-9\-]*[a-z0-9]$`) | | |
| | **SSRF prevention** | URL ingestion via trafilatura with no internal-network access | | |
| | **Upload whitelist** | Only `.pdf`, `.html`, `.docx`, `.md`, `.txt` accepted | | |
| | **File size limit** | 50MB max per upload | | |
| | **GPU isolation** | `CUDA_VISIBLE_DEVICES=-1` prevents CUDA crashes on incompatible hardware | | |
| | **No secrets in git** | `.env` gitignored; HF secrets via Space settings API | | |
| --- | |
| ## Phase Documentation | |
| Each phase has a detailed write-up covering design decisions, key code sections, and test results: | |
| | Phase | Topic | Tests | | |
| |-------|-------|-------| | |
| | [Phase 0](DOCS/phase0_foundation.md) | Project Foundation (config, models, schema, scaffold) | 58 β | | |
| | [Phase 1](DOCS/phase1_ingestion.md) | Document Ingestion (parser, chunker, indexer) | 46 β | | |
| | [Phase 2](DOCS/phase2_retrieval.md) | Hybrid Retrieval (BM25 + vector + RRF + reranker) | 33 β | | |
| | [Phase 3](DOCS/phase3_asr.md) | ASR & Voice Input (Whisper, query preprocessor) | 47 β | | |
| | [Phase 4](DOCS/phase4_generation.md) | Generation & Citations (LangChain, faithfulness guard) | 72 β | | |
| | [Phase 5](DOCS/phase5_ui_access.md) | Full UI, TTS & Access Control | 55 β | | |
| | [Phase 6](DOCS/phase6_deployment.md) | FastAPI Server, SPA Frontend & HF Deployment | 17 β | | |
| **Total: 328 tests β all passing.** | |
| --- | |
| ## License | |
| **Source Available β All Rights Reserved.** See [LICENSE](LICENSE) for full terms. | |
| The source code is publicly visible for viewing and educational purposes. Any use in personal, commercial, or academic projects requires explicit written permission from the author. | |
| To request permission: navnitamrutharaj1234@gmail.com | |
| **Author:** Navnit Amrutharaj | |