vn6295337's picture
Initial commit: RAG Document Assistant with Zero-Storage Privacy
f866820
---
title: RAG Document Assistant
emoji: πŸ”’
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
app_port: 7860
short_description: Privacy-first document search with zero storage
---
# RAG Document Assistant
**Privacy-first document search. Your data never leaves your device.**
[![Privacy](https://img.shields.io/badge/Privacy-Zero%20Storage-green)](#privacy-first-architecture)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
| Resource | Link |
|----------|------|
| Live Demo | [rag-document-assistant.vercel.app](https://rag-document-assistant.vercel.app/) |
| Product Demo Video | [Pre-recorded Demo](https://github.com/vn6295337/RAG-document-assistant/issues/2) |
| Business Guide | [BUSINESS_README.md](BUSINESS_README.md) |
---
## Privacy-First Architecture
```
INDEXING (one-time)
───────────────────────────────────────────────────────────
Your Device Server
───────────────────────────────────────────────────────────
Dropbox ──→ Files loaded
in browser
β”‚
β–Ό
Text chunked ─────────────→ Embeddings +
locally file positions only
β”‚ (no text stored)
β–Ό
Original text
PURGED βœ“
───────────────────────────────────────────────────────────
QUERY TIME (every search)
───────────────────────────────────────────────────────────
Your Question ──→ Find matching ──→ Re-fetch text
embeddings from YOUR Dropbox
β”‚ β”‚
β–Ό β–Ό
File paths ───→ Extract chunks ──→ Answer
+ positions using positions generated
───────────────────────────────────────────────────────────
```
### True Zero-Storage Privacy
1. **Client-Side Chunking**: Documents are read and chunked entirely in your browser
2. **Embeddings Only**: Only mathematical vectors are stored (irreversible)
3. **No Text Stored**: Only file paths and character positions are kept
4. **Query-Time Re-fetch**: Text is retrieved fresh from YOUR Dropbox for each query
5. **You Control Access**: Disconnect Dropbox = queries stop working = your data stays yours
## How It Works
1. **Connect** - Link your Dropbox account (OAuth - we never see your password)
2. **Select** - Choose files to index (.txt, .md, .pdf up to 5 MB)
3. **Process** - Text is chunked and embedded in your browser
4. **Search** - Query your documents with natural language
5. **Answer** - Get cited responses from your indexed content
## What Gets Stored
| Data | Stored? | Where |
|------|---------|-------|
| Your files | No | Stay in YOUR Dropbox |
| Document text | No | Re-fetched at query time |
| Embeddings | Yes | Pinecone (encrypted) |
| File paths | Yes | Pinecone metadata |
| Chunk positions | Yes | Pinecone metadata |
| Queries | No | Not logged |
Embeddings are mathematical vectors that cannot be reversed to reconstruct text. File paths and positions are used to re-fetch the exact text from your Dropbox when you search.
## Quick Start
```bash
git clone https://github.com/vn6295337/RAG-document-assistant.git
cd RAG-document-assistant
# Backend
pip install -r requirements.txt
uvicorn src.api.main:app --reload
# Frontend
cd frontend && npm install && npm run dev
```
## Tech Stack
- **Frontend**: React + Vite + Tailwind CSS
- **Backend**: FastAPI on HuggingFace Spaces
- **Vector DB**: Pinecone (embeddings only)
- **File Source**: Dropbox OAuth
- **LLM**: Multi-provider fallback (Gemini, Groq, OpenRouter)
## Documentation
- [Architecture](docs/architecture.md) - Technical design
- [API Reference](docs/api_reference.md) - Backend endpoints
- [Business Overview](BUSINESS_README.md) - Use cases and value
## License
MIT License - see [LICENSE](LICENSE)