vn6295337's picture
Initial commit: RAG Document Assistant with Zero-Storage Privacy
f866820
metadata
title: RAG Document Assistant
emoji: πŸ”’
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
app_port: 7860
short_description: Privacy-first document search with zero storage

RAG Document Assistant

Privacy-first document search. Your data never leaves your device.

Privacy License

Resource Link
Live Demo rag-document-assistant.vercel.app
Product Demo Video Pre-recorded Demo
Business Guide BUSINESS_README.md

Privacy-First Architecture

INDEXING (one-time)
───────────────────────────────────────────────────────────
Your Device                           Server
───────────────────────────────────────────────────────────
  Dropbox ──→ Files loaded
              in browser
                 β”‚
                 β–Ό
           Text chunked ─────────────→ Embeddings +
           locally                     file positions only
                 β”‚                     (no text stored)
                 β–Ό
           Original text
           PURGED βœ“
───────────────────────────────────────────────────────────

QUERY TIME (every search)
───────────────────────────────────────────────────────────
Your Question ──→ Find matching ──→ Re-fetch text
                  embeddings        from YOUR Dropbox
                       β”‚                  β”‚
                       β–Ό                  β–Ό
                  File paths ───→ Extract chunks ──→ Answer
                  + positions     using positions    generated
───────────────────────────────────────────────────────────

True Zero-Storage Privacy

  1. Client-Side Chunking: Documents are read and chunked entirely in your browser
  2. Embeddings Only: Only mathematical vectors are stored (irreversible)
  3. No Text Stored: Only file paths and character positions are kept
  4. Query-Time Re-fetch: Text is retrieved fresh from YOUR Dropbox for each query
  5. You Control Access: Disconnect Dropbox = queries stop working = your data stays yours

How It Works

  1. Connect - Link your Dropbox account (OAuth - we never see your password)
  2. Select - Choose files to index (.txt, .md, .pdf up to 5 MB)
  3. Process - Text is chunked and embedded in your browser
  4. Search - Query your documents with natural language
  5. Answer - Get cited responses from your indexed content

What Gets Stored

Data Stored? Where
Your files No Stay in YOUR Dropbox
Document text No Re-fetched at query time
Embeddings Yes Pinecone (encrypted)
File paths Yes Pinecone metadata
Chunk positions Yes Pinecone metadata
Queries No Not logged

Embeddings are mathematical vectors that cannot be reversed to reconstruct text. File paths and positions are used to re-fetch the exact text from your Dropbox when you search.

Quick Start

git clone https://github.com/vn6295337/RAG-document-assistant.git
cd RAG-document-assistant

# Backend
pip install -r requirements.txt
uvicorn src.api.main:app --reload

# Frontend
cd frontend && npm install && npm run dev

Tech Stack

  • Frontend: React + Vite + Tailwind CSS
  • Backend: FastAPI on HuggingFace Spaces
  • Vector DB: Pinecone (embeddings only)
  • File Source: Dropbox OAuth
  • LLM: Multi-provider fallback (Gemini, Groq, OpenRouter)

Documentation

License

MIT License - see LICENSE