RAG-document-assistant / docs /architecture.md
vn6295337's picture
Initial commit: RAG Document Assistant with Zero-Storage Privacy
f866820
# RAG Document Assistant - Architecture
> **Version**: 2.0
> **Last Updated**: January 2026
> **Focus**: Zero-Storage Privacy Architecture
---
## System Overview
A privacy-first RAG (Retrieval-Augmented Generation) system where **no document text is ever stored on our servers**. Documents are processed client-side, and text is re-fetched from the user's cloud storage at query time.
### Key Characteristics
- **Zero-Storage**: Document text never persists on servers
- **Client-Side Processing**: Chunking happens in the browser
- **Query-Time Re-fetch**: Text retrieved from user's Dropbox for each search
- **User Control**: Disconnect cloud storage to revoke all access
---
## Privacy Architecture
```
INDEXING (one-time setup)
══════════════════════════════════════════════════════════════════
User's Browser Our Server
────────────── ──────────
1. Connect Dropbox (OAuth)
β”‚
β–Ό
2. Select files from Dropbox
β”‚
β–Ό
3. Files loaded in browser
(never sent to server)
β”‚
β–Ό
4. Text chunked locally ───────────────► 5. Generate embeddings
with position tracking (384-dim vectors)
β”‚ β”‚
β–Ό β–Ό
6. Original text 7. Store in Pinecone:
PURGED from memory - Embeddings (irreversible)
- File paths
- Chunk positions
- NO TEXT
══════════════════════════════════════════════════════════════════
QUERY TIME (every search)
══════════════════════════════════════════════════════════════════
User's Question Our Server
─────────────── ──────────
"What does the contract say?"
β”‚
β–Ό
─────────────────────────────────────► 1. Generate query embedding
β”‚
β–Ό
2. Search Pinecone
(find similar chunks)
β”‚
β–Ό
3. Get file paths + positions
β”‚
β–Ό
4. Re-fetch from USER'S Dropbox
using their access token
β”‚
β–Ό
5. Extract chunk text
using stored positions
β”‚
β–Ό
6. Send to LLM for answer
β”‚
β–Ό
Answer + Citations ◄───────────────── 7. Return response
(text never stored)
══════════════════════════════════════════════════════════════════
```
---
## What Gets Stored
| Data | Stored? | Where | Reversible? |
|------|---------|-------|-------------|
| Document files | No | User's Dropbox only | N/A |
| Document text | No | Never stored | N/A |
| Embeddings | Yes | Pinecone | No (one-way transform) |
| File paths | Yes | Pinecone metadata | N/A |
| Chunk positions | Yes | Pinecone metadata | N/A |
| User queries | No | Not logged | N/A |
---
## Technology Stack
### Frontend
- **Framework**: React 18 + Vite
- **Styling**: Tailwind CSS v4
- **Deployment**: Vercel
- **Key Features**:
- Client-side text chunking
- Dropbox OAuth integration
- Position tracking for chunks
### Backend
- **Framework**: FastAPI
- **Deployment**: HuggingFace Spaces (Docker)
- **Key Features**:
- Zero-storage embedding endpoint
- Query-time Dropbox re-fetch
- Multi-provider LLM cascade
### Vector Database
- **Service**: Pinecone Serverless
- **Index**: `rag-semantic-384`
- **Dimensions**: 384
- **Metric**: Cosine similarity
### Embeddings
- **Model**: `all-MiniLM-L6-v2` (sentence-transformers)
- **Dimensions**: 384
- **Processing**: Server-side (text discarded immediately)
### LLM Providers (Cascade)
1. **Gemini 2.5 Flash** (Primary)
2. **Groq** - llama-3.1-8b-instant (Fallback 1)
3. **OpenRouter** - Mistral 7B (Fallback 2)
---
## Component Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FRONTEND (React) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Sidebar β”‚ β”‚ QueryPanel β”‚ β”‚ App.jsx β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ - CloudConnect β”‚ - Search UI β”‚ - State mgmt β”‚ β”‚
β”‚ β”‚ - File select β”‚ - Results β”‚ - Token flow β”‚ β”‚
β”‚ β”‚ - Index button β”‚ - Citations β”‚ - Privacy UI β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ API Layer β”‚ β”‚
β”‚ β”‚ chunker.js β”‚ dropbox.js β”‚ client.js β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ HTTPS
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ BACKEND (FastAPI) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ API Routes β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ POST /embed-chunks - Generate embeddings β”‚ β”‚
β”‚ β”‚ POST /query-secure - Zero-storage query β”‚ β”‚
β”‚ β”‚ POST /dropbox/token - OAuth token exchange β”‚ β”‚
β”‚ β”‚ POST /dropbox/folder - List folder contents β”‚ β”‚
β”‚ β”‚ POST /dropbox/file - Download file content β”‚ β”‚
β”‚ β”‚ DELETE /clear-index - Clear Pinecone index β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Pinecone β”‚ β”‚ Dropbox β”‚ β”‚ LLM β”‚
β”‚ (vectors)β”‚ β”‚ (files) β”‚ β”‚ Providersβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Data Flow: Indexing
```python
# 1. User selects files in browser
files = [
{id: "abc123", name: "contract.pdf", path: "/Documents/contract.pdf"}
]
# 2. Files fetched from Dropbox (via backend proxy)
content = await fetch("/api/dropbox/file", {path: file.path, access_token})
# 3. Text chunked CLIENT-SIDE with position tracking
chunks = chunkText(content, {chunkSize: 1000, overlap: 100})
# Result:
# {text: "...", startChar: 0, endChar: 1000}
# {text: "...", startChar: 900, endChar: 1900}
# 4. Chunks sent to backend for embedding
await fetch("/api/embed-chunks", {
chunks: [{
text: "...", // Used for embedding only
metadata: {
filename: "contract.pdf",
filePath: "/Documents/contract.pdf",
fileId: "abc123",
startChar: 0,
endChar: 1000
}
}]
})
# 5. Backend generates embeddings, stores in Pinecone
# TEXT IS IMMEDIATELY DISCARDED
pinecone.upsert({
id: "abc123::0",
values: [0.123, -0.456, ...], # 384-dim embedding
metadata: {
filename: "contract.pdf",
file_path: "/Documents/contract.pdf",
file_id: "abc123",
start_char: 0,
end_char: 1000
# NO TEXT STORED
}
})
```
---
## Data Flow: Query
```python
# 1. User submits query with access token
request = {
query: "What is the payment term?",
access_token: "user_dropbox_token"
}
# 2. Generate query embedding
query_embedding = sentence_transformer.encode(query)
# 3. Search Pinecone
results = pinecone.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
# Returns: file paths + positions (NO TEXT)
# 4. Re-fetch files from USER'S Dropbox
for file_path in unique_file_paths:
content = dropbox.download(file_path, access_token)
# 5. Extract chunks using stored positions
for chunk in chunks_from_file:
text = content[chunk.start_char:chunk.end_char]
# 6. Build prompt with re-fetched text
prompt = f"""
Context:
1. {chunk1_text}
2. {chunk2_text}
Question: {query}
"""
# 7. Call LLM
answer = llm.generate(prompt)
# 8. Return answer (text never stored)
return {answer, citations}
```
---
## Security & Privacy
### User Control
- **OAuth Scopes**: Read-only access to user-selected files
- **Token Storage**: Access token stored only in browser session
- **Revocation**: Disconnect Dropbox = immediate access revocation
### Server Security
- **No Persistent Storage**: Text never written to disk or database
- **Memory Only**: Text exists in memory only during processing
- **Immediate Purge**: Explicit deletion after embedding generation
### Data Protection
- **Embeddings**: One-way transformation, cannot reconstruct text
- **Positions**: Only useful with original file access
- **File Paths**: Dropbox paths, require valid access token
---
## Deployment
### Frontend (Vercel)
- Automatic deploys from GitHub
- Environment: `VITE_API_URL` pointing to backend
### Backend (HuggingFace Spaces)
- Docker-based deployment
- Environment variables for API keys:
- `PINECONE_API_KEY`
- `DROPBOX_APP_KEY`
- `DROPBOX_APP_SECRET`
- `GEMINI_API_KEY`
- `GROQ_API_KEY`
---
## References
- **Live Demo**: https://rag-document-assistant.vercel.app/
- **Backend API**: https://vn6295337-rag-document-assistant.hf.space/
- **GitHub**: https://github.com/vn6295337/RAG-document-assistant