Spaces:
Sleeping
Sleeping
| title: RAG Document Assistant | |
| emoji: π | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| app_port: 7860 | |
| short_description: Privacy-first document search with zero storage | |
| # RAG Document Assistant | |
| **Privacy-first document search. Your data never leaves your device.** | |
| [](#privacy-first-architecture) | |
| [](LICENSE) | |
| | Resource | Link | | |
| |----------|------| | |
| | Live Demo | [rag-document-assistant.vercel.app](https://rag-document-assistant.vercel.app/) | | |
| | Product Demo Video | [Pre-recorded Demo](https://github.com/vn6295337/RAG-document-assistant/issues/2) | | |
| | Business Guide | [BUSINESS_README.md](BUSINESS_README.md) | | |
| --- | |
| ## Privacy-First Architecture | |
| ``` | |
| INDEXING (one-time) | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Your Device Server | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Dropbox βββ Files loaded | |
| in browser | |
| β | |
| βΌ | |
| Text chunked ββββββββββββββ Embeddings + | |
| locally file positions only | |
| β (no text stored) | |
| βΌ | |
| Original text | |
| PURGED β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| QUERY TIME (every search) | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Your Question βββ Find matching βββ Re-fetch text | |
| embeddings from YOUR Dropbox | |
| β β | |
| βΌ βΌ | |
| File paths ββββ Extract chunks βββ Answer | |
| + positions using positions generated | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### True Zero-Storage Privacy | |
| 1. **Client-Side Chunking**: Documents are read and chunked entirely in your browser | |
| 2. **Embeddings Only**: Only mathematical vectors are stored (irreversible) | |
| 3. **No Text Stored**: Only file paths and character positions are kept | |
| 4. **Query-Time Re-fetch**: Text is retrieved fresh from YOUR Dropbox for each query | |
| 5. **You Control Access**: Disconnect Dropbox = queries stop working = your data stays yours | |
| ## How It Works | |
| 1. **Connect** - Link your Dropbox account (OAuth - we never see your password) | |
| 2. **Select** - Choose files to index (.txt, .md, .pdf up to 5 MB) | |
| 3. **Process** - Text is chunked and embedded in your browser | |
| 4. **Search** - Query your documents with natural language | |
| 5. **Answer** - Get cited responses from your indexed content | |
| ## What Gets Stored | |
| | Data | Stored? | Where | | |
| |------|---------|-------| | |
| | Your files | No | Stay in YOUR Dropbox | | |
| | Document text | No | Re-fetched at query time | | |
| | Embeddings | Yes | Pinecone (encrypted) | | |
| | File paths | Yes | Pinecone metadata | | |
| | Chunk positions | Yes | Pinecone metadata | | |
| | Queries | No | Not logged | | |
| Embeddings are mathematical vectors that cannot be reversed to reconstruct text. File paths and positions are used to re-fetch the exact text from your Dropbox when you search. | |
| ## Quick Start | |
| ```bash | |
| git clone https://github.com/vn6295337/RAG-document-assistant.git | |
| cd RAG-document-assistant | |
| # Backend | |
| pip install -r requirements.txt | |
| uvicorn src.api.main:app --reload | |
| # Frontend | |
| cd frontend && npm install && npm run dev | |
| ``` | |
| ## Tech Stack | |
| - **Frontend**: React + Vite + Tailwind CSS | |
| - **Backend**: FastAPI on HuggingFace Spaces | |
| - **Vector DB**: Pinecone (embeddings only) | |
| - **File Source**: Dropbox OAuth | |
| - **LLM**: Multi-provider fallback (Gemini, Groq, OpenRouter) | |
| ## Documentation | |
| - [Architecture](docs/architecture.md) - Technical design | |
| - [API Reference](docs/api_reference.md) - Backend endpoints | |
| - [Business Overview](BUSINESS_README.md) - Use cases and value | |
| ## License | |
| MIT License - see [LICENSE](LICENSE) | |