Spaces:
Running
Running
File size: 16,778 Bytes
85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae 85f900d 05853ae | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 | ---
title: VoiceVault
emoji: ποΈ
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: other
---
<div align="center">
# VoiceVault
**Voice-First RAG Knowledge Agent**
*Speak to your documents. Get cited answers back.*
[](https://www.python.org/)
[](https://fastapi.tiangolo.com/)
[](LICENSE)
[](tests/)
[](https://huggingface.co/spaces/NinjainPJs/VoiceVault)
[**Live Demo β**](https://huggingface.co/spaces/NinjainPJs/VoiceVault) | [**Documentation β**](DOCS/) | [**Project Plan β**](PLAN.md)
</div>
---
## Overview
VoiceVault is a production-grade, voice-first Retrieval-Augmented Generation (RAG) system built entirely from scratch. It enables users to record or type questions and receive answers grounded in their own private document collections β with inline citations pointing back to the exact source, page, and paragraph.
The project was built in 6 phases over several weeks, with a full test suite (328 tests), enterprise-grade security practices (bcrypt, parameterized SQL, SHA-256 audit logs, SSRF prevention), and deployment to Hugging Face Spaces via Docker.
**What makes this different from typical RAG demos:**
- **Hybrid retrieval** β BM25 keyword search + semantic vector search, fused with Reciprocal Rank Fusion (RRF) + cross-encoder reranking. Most tutorials use only one retrieval method.
- **Voice-native pipeline** β Groq Whisper API for ~300ms cloud transcription with local Whisper fallback; Web Speech API for TTS output.
- **Faithfulness guard** β Detects when the LLM cannot answer from retrieved context and returns a grounded refusal instead of hallucinating.
- **Multi-KB support** β Multiple independent knowledge bases, each optionally password-protected.
---
## Screenshots
<div align="center">
### Ask VoiceVault β Voice Query Interface
*Record your question via microphone or type it. The mic button pulses when recording.*
<img src="Screenshots/1.png" alt="Ask VoiceVault β main voice query interface with dark glassmorphism UI" width="800"/>
---
### Knowledge Base Management
*Create named knowledge bases, upload documents (PDF, DOCX, HTML, MD, TXT), and manage them.*
<img src="Screenshots/2.png" alt="Knowledge Bases panel β empty state with New Knowledge Base button" width="800"/>
---
### Analytics Dashboard
*Real-time query statistics: total queries, average latency, citation counts, and daily breakdowns.*
<img src="Screenshots/3.png" alt="Analytics dashboard showing query statistics" width="800"/>
---
### Full App in Action
*A populated knowledge base (358 chunks from 1 document) and a live conversation with the RAG pipeline.*
<img src="Screenshots/4.png" alt="Full VoiceVault app with a knowledge base and active conversation" width="800"/>
</div>
---
## Architecture
```
INGESTION PATH (one-time per document set)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User uploads PDF / HTML / DOCX / MD / TXT
β
βΌ
DocumentParser β text + metadata per page
β (PyMuPDF, BS4, python-docx)
βΌ
SemanticChunker β sentence-aware chunks
β (spaCy sentences + cosine boundary)
βΌ
IndexBuilder β ChromaDB (vector) + BM25 (keyword)
+ SQLite (metadata)
QUERY PATH (real-time, per question)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Browser mic β WAV β POST /api/transcribe
β
βΌ
GroqTranscriber β Groq Whisper API (~300ms)
β [fallback: local Whisper CPU]
βΌ
QueryPreprocessor β filler removal, intent classification
β (factual / summary / compare)
βΌ
HybridRetriever β BM25 top-20 + Vector top-20
β β RRF merge (k=60)
β β CrossEncoder rerank (ms-marco-MiniLM-L12-v2)
β β diversity filter (max 2 chunks/page)
βΌ
ContextBuilder β formatted context with [Source:N] markers
βΌ
LangChain LCEL β Groq Llama-3.1-70B (primary)
β [fallback: Gemini 1.5 Flash]
βΌ
FaithfulnessGuard β refusal detection, confidence scoring
β
CitationInjector β resolve [Source:N] β filename + page
βΌ
JSON response β answer + citations + confidence + tts_text
β
βΌ
SPA Frontend β chat display + Web Speech API TTS
```
---
## Features
| Feature | Detail |
|---------|--------|
| **Voice Input** | Browser microphone β WAV conversion β Groq Whisper API (~300ms) |
| **Hybrid Retrieval** | BM25 + semantic vector search, RRF fusion, cross-encoder reranking |
| **Multi-KB** | Create multiple independent knowledge bases per session |
| **KB Access Control** | Optional bcrypt password protection (work factor 12) per KB |
| **Document Formats** | PDF, DOCX, HTML, Markdown, TXT (OCR fallback for scanned PDFs) |
| **Source Citations** | Every answer traceable to source file + page number |
| **Faithfulness Guard** | Detects hallucinations; returns grounded refusal when context is insufficient |
| **Conversation Memory** | Rolling 5-turn conversation window passed to the LLM |
| **LLM Fallback** | Groq Llama-3.1-70B β Gemini 1.5 Flash automatic fallback |
| **TTS Output** | Web Speech API reads answer aloud with citation markers stripped |
| **Analytics** | SQLite audit log: query counts, latency, citation rates (7-day window) |
| **Privacy** | Raw queries never stored β SHA-256 hash only in audit log |
| **328 Tests** | Integration + unit tests across all 6 phases |
---
## Tech Stack
| Layer | Technology | Purpose |
|-------|-----------|---------|
| **API** | FastAPI + uvicorn | REST backend with async endpoints |
| **Frontend** | HTML5 / CSS3 / Vanilla JS | Premium dark SPA (no framework) |
| **ASR** | Groq Whisper API | Cloud transcription (~300ms) |
| **ASR Fallback** | OpenAI Whisper Large-v3 | Local CPU transcription |
| **Embeddings** | sentence-transformers `all-MiniLM-L6-v2` | Dense vector representations |
| **Reranking** | `cross-encoder/ms-marco-MiniLM-L12-v2` | Semantic relevance scoring |
| **Vector Store** | ChromaDB | In-process vector database |
| **Keyword Search** | rank-bm25 (BM25Okapi) | Lexical keyword matching |
| **Chunking** | spaCy `en_core_web_sm` | Sentence boundary detection |
| **LLM (primary)** | Groq Llama-3.1-70B | Fast inference via Groq cloud |
| **LLM (fallback)** | Gemini 1.5 Flash | Google generative AI fallback |
| **Orchestration** | LangChain LCEL | LLM pipeline composition |
| **Metadata** | SQLite | KB registry, doc index, audit log |
| **Security** | bcrypt (work factor 12) | KB password hashing |
| **Config** | Pydantic-settings | Centralized, type-safe config |
| **Deployment** | Docker on Hugging Face Spaces | Container-based cloud hosting |
---
## Project Structure
```
Project-VoiceVault/
βββ server.py # FastAPI entry point (run this)
βββ app.py # Gradio entry point (legacy / tests)
βββ config.py # Centralized Pydantic-settings config
βββ requirements.txt # All dependencies
βββ Dockerfile # HF Spaces Docker deployment
βββ .env.example # Environment variable template
β
βββ api/ # FastAPI REST API
β βββ __init__.py
β βββ routes.py # All /api/* endpoints
β
βββ static/ # SPA frontend assets
β βββ index.html # Single-page application shell
β βββ style.css # Dark glassmorphism design system
β βββ app.js # Full SPA logic (recording, chat, KB CRUD)
β
βββ voicevault/ # Core package
β βββ models.py # Pydantic data models
β βββ asr/
β β βββ groq_transcriber.py # Groq Whisper cloud ASR (~300ms)
β β βββ whisper_transcriber.py # Local Whisper CPU/GPU fallback
β β βββ query_preprocessor.py # Filler removal, intent classification
β βββ ingestion/
β β βββ document_parser.py # PDF/HTML/DOCX/MD/TXT β structured text
β β βββ semantic_chunker.py # Sentence-aware chunking with topic boundaries
β β βββ index_builder.py # ChromaDB + BM25 + SQLite orchestration
β βββ retrieval/
β β βββ hybrid_retriever.py # BM25 + vector + RRF + cross-encoder
β β βββ bm25_retriever.py # BM25Okapi keyword search
β β βββ vector_retriever.py # ChromaDB semantic search
β β βββ context_builder.py # Context formatting + citation markers
β βββ generation/
β β βββ answer_chain.py # LangChain LCEL + Groq + Gemini fallback
β β βββ faithfulness_guard.py # Hallucination detection + refusal
β β βββ citation_injector.py # [Source:N] β filename + page resolution
β βββ kb/
β β βββ kb_manager.py # KB lifecycle, bcrypt auth, validation
β βββ storage/
β β βββ sqlite_store.py # Schema, CRUD, audit log queries
β β βββ chroma_store.py # ChromaDB wrapper
β βββ tts/
β βββ web_speech.py # TTS text preparation
β
βββ ui/ # Gradio UI components (legacy / app.py)
β βββ tabs/
β β βββ ask_tab.py
β β βββ kb_tab.py
β β βββ analytics_tab.py
β β βββ settings_tab.py
β βββ components/
β βββ citation_panel.py
β βββ audio_controls.py
β
βββ tests/ # Full test suite β 328 tests
β βββ conftest.py
β βββ test_api_routes.py # Integration tests (FastAPI + real methods)
β βββ test_phase0.py # Foundation tests
β βββ test_phase1.py # Ingestion tests
β βββ test_phase2.py # Retrieval tests
β βββ test_phase3.py # ASR tests
β βββ test_phase4.py # Generation tests
β βββ test_phase5.py # UI / access control tests
β
βββ DOCS/ # Detailed phase documentation
β βββ phase0_foundation.md
β βββ phase1_ingestion.md
β βββ phase2_retrieval.md
β βββ phase3_asr.md
β βββ phase4_generation.md
β βββ phase5_ui_access.md
β βββ phase6_deployment.md
β
βββ Screenshots/
βββ 1.png # Ask tab β voice query interface
βββ 2.png # Knowledge Bases panel
βββ 3.png # Analytics dashboard
βββ 4.png # Full app with KB and live conversation
```
---
## Quick Start
### Prerequisites
- Python 3.11+
- A Groq API key ([free at console.groq.com](https://console.groq.com))
- Optionally a Gemini API key ([free at aistudio.google.com](https://aistudio.google.com))
### 1. Clone and install
```bash
git clone https://github.com/ninjacode911/Project-VoiceVault.git
cd Project-VoiceVault
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install torch --index-url https://download.pytorch.org/whl/cpu # CPU-only (saves ~1.8GB)
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```
### 2. Configure secrets
```bash
cp .env.example .env
# Edit .env and add:
# GROQ_API_KEY=gsk_...
# GEMINI_API_KEY=... (optional)
```
### 3. Run
```bash
python server.py
# Open http://localhost:7860
```
### 4. Use it
1. Navigate to **Knowledge Bases** β click **+ New Knowledge Base**
2. Name it (lowercase, hyphens only, e.g. `my-docs`) and upload your PDFs/documents
3. Go back to **Ask VoiceVault** β select your KB β record or type a question β click **Ask**
---
## Running Tests
```bash
pytest tests/ -v
# Expected: 328 passed
```
The integration tests in `tests/test_api_routes.py` use a real `KBManager` backed by a temp SQLite DB and exercise the actual FastAPI routes and method signatures β not mocked pipelines. This is intentional: it catches runtime `AttributeError` bugs that pure-mock unit tests miss.
---
## Deployment to Hugging Face Spaces
The project ships with a `Dockerfile` configured for HF Spaces. The Docker image:
- Uses Python 3.11-slim base
- Installs CPU-only PyTorch (~650MB vs 2.5GB GPU wheels)
- Pre-downloads `all-MiniLM-L6-v2` and `cross-encoder/ms-marco-MiniLM-L12-v2` at build time (no cold-start model downloads)
- Downloads `en_core_web_sm` spaCy model at build time
- Binds to `0.0.0.0:7860` (HF Spaces default port)
To deploy your own copy:
1. Create a [Hugging Face Space](https://huggingface.co/new-space) with **Docker** SDK
2. Push this repository to the Space's git remote
3. Add `GROQ_API_KEY` (and optionally `GEMINI_API_KEY`) as Space secrets
See [DOCS/phase6_deployment.md](DOCS/phase6_deployment.md) for the full deployment walkthrough.
---
## Configuration
All configuration is environment-driven via `.env`. See [`.env.example`](.env.example) for the full reference.
Key variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `GROQ_API_KEY` | β | **Required.** Groq API key for Whisper + Llama |
| `GEMINI_API_KEY` | β | Optional Gemini fallback key |
| `HOST` | `0.0.0.0` | Server bind address |
| `PORT` | `7860` | Server port |
| `FINAL_TOP_K` | `5` | Number of chunks passed to LLM |
| `MAX_ANSWER_TOKENS` | `500` | LLM max output tokens |
| `CHUNK_SIZE_MAX` | `600` | Max tokens per document chunk |
| `BCRYPT_ROUNDS` | `12` | bcrypt work factor for KB passwords |
---
## Security
| Control | Implementation |
|---------|----------------|
| **No raw queries stored** | Audit log stores SHA-256 hash only |
| **KB access control** | bcrypt-hashed passwords (work factor 12) |
| **SQL injection prevention** | 100% parameterized queries β no f-string SQL |
| **Path traversal prevention** | KB names validated as slugs (`^[a-z0-9][a-z0-9\-]*[a-z0-9]$`) |
| **SSRF prevention** | URL ingestion via trafilatura with no internal-network access |
| **Upload whitelist** | Only `.pdf`, `.html`, `.docx`, `.md`, `.txt` accepted |
| **File size limit** | 50MB max per upload |
| **GPU isolation** | `CUDA_VISIBLE_DEVICES=-1` prevents CUDA crashes on incompatible hardware |
| **No secrets in git** | `.env` gitignored; HF secrets via Space settings API |
---
## Phase Documentation
Each phase has a detailed write-up covering design decisions, key code sections, and test results:
| Phase | Topic | Tests |
|-------|-------|-------|
| [Phase 0](DOCS/phase0_foundation.md) | Project Foundation (config, models, schema, scaffold) | 58 β
|
| [Phase 1](DOCS/phase1_ingestion.md) | Document Ingestion (parser, chunker, indexer) | 46 β
|
| [Phase 2](DOCS/phase2_retrieval.md) | Hybrid Retrieval (BM25 + vector + RRF + reranker) | 33 β
|
| [Phase 3](DOCS/phase3_asr.md) | ASR & Voice Input (Whisper, query preprocessor) | 47 β
|
| [Phase 4](DOCS/phase4_generation.md) | Generation & Citations (LangChain, faithfulness guard) | 72 β
|
| [Phase 5](DOCS/phase5_ui_access.md) | Full UI, TTS & Access Control | 55 β
|
| [Phase 6](DOCS/phase6_deployment.md) | FastAPI Server, SPA Frontend & HF Deployment | 17 β
|
**Total: 328 tests β all passing.**
---
## License
**Source Available β All Rights Reserved.** See [LICENSE](LICENSE) for full terms.
The source code is publicly visible for viewing and educational purposes. Any use in personal, commercial, or academic projects requires explicit written permission from the author.
To request permission: navnitamrutharaj1234@gmail.com
**Author:** Navnit Amrutharaj
|