Spaces:
Sleeping
Sleeping
metadata
title: RAG Document Assistant
emoji: π
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
app_port: 7860
short_description: Privacy-first document search with zero storage
RAG Document Assistant
Privacy-first document search. Your data never leaves your device.
| Resource | Link |
|---|---|
| Live Demo | rag-document-assistant.vercel.app |
| Product Demo Video | Pre-recorded Demo |
| Business Guide | BUSINESS_README.md |
Privacy-First Architecture
INDEXING (one-time)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Your Device Server
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dropbox βββ Files loaded
in browser
β
βΌ
Text chunked ββββββββββββββ Embeddings +
locally file positions only
β (no text stored)
βΌ
Original text
PURGED β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
QUERY TIME (every search)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Your Question βββ Find matching βββ Re-fetch text
embeddings from YOUR Dropbox
β β
βΌ βΌ
File paths ββββ Extract chunks βββ Answer
+ positions using positions generated
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
True Zero-Storage Privacy
- Client-Side Chunking: Documents are read and chunked entirely in your browser
- Embeddings Only: Only mathematical vectors are stored (irreversible)
- No Text Stored: Only file paths and character positions are kept
- Query-Time Re-fetch: Text is retrieved fresh from YOUR Dropbox for each query
- You Control Access: Disconnect Dropbox = queries stop working = your data stays yours
How It Works
- Connect - Link your Dropbox account (OAuth - we never see your password)
- Select - Choose files to index (.txt, .md, .pdf up to 5 MB)
- Process - Text is chunked and embedded in your browser
- Search - Query your documents with natural language
- Answer - Get cited responses from your indexed content
What Gets Stored
| Data | Stored? | Where |
|---|---|---|
| Your files | No | Stay in YOUR Dropbox |
| Document text | No | Re-fetched at query time |
| Embeddings | Yes | Pinecone (encrypted) |
| File paths | Yes | Pinecone metadata |
| Chunk positions | Yes | Pinecone metadata |
| Queries | No | Not logged |
Embeddings are mathematical vectors that cannot be reversed to reconstruct text. File paths and positions are used to re-fetch the exact text from your Dropbox when you search.
Quick Start
git clone https://github.com/vn6295337/RAG-document-assistant.git
cd RAG-document-assistant
# Backend
pip install -r requirements.txt
uvicorn src.api.main:app --reload
# Frontend
cd frontend && npm install && npm run dev
Tech Stack
- Frontend: React + Vite + Tailwind CSS
- Backend: FastAPI on HuggingFace Spaces
- Vector DB: Pinecone (embeddings only)
- File Source: Dropbox OAuth
- LLM: Multi-provider fallback (Gemini, Groq, OpenRouter)
Documentation
- Architecture - Technical design
- API Reference - Backend endpoints
- Business Overview - Use cases and value
License
MIT License - see LICENSE