Spaces:
Build error
Build error
| # ColPali π€ Vespa - Visual Retrieval System | |
| A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering. | |
| ## π Features | |
| ### π **Visual Document Search** | |
| - **Multi-modal retrieval**: Search through PDF documents using natural language queries | |
| - **Visual understanding**: ColPali model processes document images and text simultaneously | |
| - **Token-level similarity maps**: Visualize exactly which parts of documents match your query | |
| - **Multiple ranking algorithms**: Choose between hybrid, semantic, and other ranking methods | |
| ### π§ **AI-Powered Chat** | |
| - **Intelligent Q&A**: Ask questions about retrieved documents using Google Gemini 2.0 | |
| - **Context-aware responses**: AI analyzes document images to provide accurate answers | |
| - **Real-time streaming**: Get responses as they're generated | |
| ### β‘ **Scalable Infrastructure** | |
| - **Vespa integration**: Enterprise-grade search platform for large document collections | |
| - **Real-time processing**: Instant search results and similarity map generation | |
| - **Cloud-ready**: Supports Vespa Cloud deployment with secure authentication | |
| ## ποΈ Architecture | |
| ``` | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β Frontend β β Backend β β Vespa Cloud β | |
| β (Browser) β β (Your Local β β (Remote) β | |
| β β β Computer) β β β | |
| β β’ Search UI βββββΊβ β’ ColPali Model βββββΊβ β’ Document Storeβ | |
| β β’ Similarity β β β’ Query Proc. β β β’ Vector Search β | |
| β Maps β β β’ Sim Map Gen. β β β’ Ranking β | |
| β β’ Chat Interfaceβ β β’ Gemini Int. β β β | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β β β | |
| Web Browser LOCAL AI REMOTE Storage | |
| ``` | |
| ### π **LOCAL Processing (Your Computer)** | |
| **All AI model inference happens on YOUR local machine:** | |
| - **ColPali Model**: Runs locally on your GPU/CPU (~7GB model) | |
| - **Document Processing**: PDF β Images β Embeddings (local) | |
| - **Query Processing**: Text β Embeddings (local) | |
| - **Similarity Maps**: Visual attention generation (local) | |
| - **Gemini Chat**: Processes retrieved images locally | |
| **Device Detection:** | |
| ```python | |
| device = get_torch_device("auto") # Detects: CUDA, MPS (Apple), or CPU | |
| print(f"Using device: {device}") # Shows YOUR hardware | |
| ``` | |
| ### βοΈ **REMOTE Processing (Vespa Cloud)** | |
| **Only storage and search index operations happen remotely:** | |
| - **Document Storage**: Stores processed embeddings (not raw models) | |
| - **Vector Search**: Fast similarity search across document collection | |
| - **Query Routing**: Handles search requests and ranking | |
| - **Metadata Storage**: Document titles, URLs, page numbers | |
| ### π **Complete Data Flow** | |
| #### **Document Upload Process:** | |
| 1. **LOCAL**: Your computer downloads PDF from URL | |
| 2. **LOCAL**: ColPali converts PDF pages to images | |
| 3. **LOCAL**: ColPali generates visual embeddings (1024 patches Γ 128 dims) | |
| 4. **LOCAL**: Embeddings converted to binary format for efficiency | |
| 5. **REMOTE**: Binary embeddings uploaded to Vespa Cloud | |
| 6. **REMOTE**: Vespa indexes embeddings for fast search | |
| #### **Search Query Process:** | |
| 1. **LOCAL**: You enter search query in browser | |
| 2. **LOCAL**: ColPali processes query β generates query embeddings | |
| 3. **REMOTE**: Query embeddings sent to Vespa Cloud | |
| 4. **REMOTE**: Vespa searches document index, returns matches | |
| 5. **LOCAL**: ColPali generates similarity maps for results | |
| 6. **BROWSER**: Results displayed with visual attention maps | |
| #### **AI Chat Process:** | |
| 1. **LOCAL**: Retrieved document images processed by your machine | |
| 2. **REMOTE**: Images + query sent to Google Gemini API | |
| 3. **REMOTE**: Gemini generates response based on visual content | |
| 4. **BROWSER**: Streaming response displayed in real-time | |
| ### Core Components | |
| - **ColPali Model**: Visual-language model for document understanding (LOCAL) | |
| - **Vespa Search**: Distributed search and storage engine (REMOTE) | |
| - **FastHTML Frontend**: Modern, responsive web interface (BROWSER) | |
| - **Gemini Integration**: AI-powered question answering (REMOTE API) | |
| - **Similarity Map Generator**: Visual attention visualization (LOCAL) | |
| ## π» **System Requirements** | |
| ### **LOCAL Machine Requirements (For AI Processing)** | |
| **Minimum:** | |
| - **CPU**: Modern multi-core processor (Intel/AMD/Apple Silicon) | |
| - **RAM**: 8GB+ (16GB recommended) | |
| - **Storage**: 10GB free space (for model cache) | |
| - **Python**: 3.10+ (< 3.13) | |
| **Recommended:** | |
| - **GPU**: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better) | |
| - **Apple**: M1/M2/M3 Mac (uses Metal Performance Shaders) | |
| - **RAM**: 16GB+ for smoother processing | |
| - **Storage**: SSD for faster model loading | |
| **Performance Examples:** | |
| - **RTX 4090**: ~1-2 seconds per query | |
| - **RTX 3070**: ~3-5 seconds per query | |
| - **Apple M2**: ~4-6 seconds per query | |
| - **CPU Only**: ~15-30 seconds per query | |
| ### **REMOTE Requirements (Vespa Cloud)** | |
| **What you need:** | |
| - **Vespa Cloud account** (handles all remote processing) | |
| - **Internet connection** (for uploading embeddings and search queries) | |
| - **Authentication tokens** (provided by Vespa Cloud) | |
| **What Vespa Cloud provides:** | |
| - **Scalable storage** for any number of documents | |
| - **Sub-second search** across millions of embeddings | |
| - **High availability** with automatic failover | |
| - **Global CDN** for fast access worldwide | |
| ## π° **Cost Breakdown** | |
| ### **FREE Components** | |
| - **ColPali Model**: Open source, runs locally (no per-query costs) | |
| - **Python Application**: MIT/Apache licensed, completely free | |
| - **Local Processing**: Uses your own hardware (no cloud AI fees) | |
| ### **PAID Components** | |
| - **Vespa Cloud**: Pay for storage and search operations | |
| - ~$0.001 per 1000 searches | |
| - ~$0.10 per GB storage per month | |
| - **Google Gemini API**: Optional, for chat features only | |
| - ~$0.01 per 1000 image tokens | |
| - Only used when you ask questions about documents | |
| ### **Cost Examples (Monthly)** | |
| - **Personal Use** (100 documents, 1000 searches): ~$5-10/month | |
| - **Small Business** (1000 documents, 10k searches): ~$20-50/month | |
| - **Enterprise** (10k+ documents, 100k+ searches): $200+/month | |
| **π‘ Cost Optimization Tips:** | |
| - Use local Vespa installation to avoid cloud costs | |
| - Disable Gemini chat if not needed (saves API costs) | |
| - Process documents in batches to minimize upload time | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.10+ (< 3.13) | |
| - **8GB+ RAM** for ColPali model | |
| - **Vespa Cloud account** or local Vespa installation | |
| - **Google Gemini API key** (optional, for chat features) | |
| - **GPU recommended** but not required | |
| ### 1. Installation | |
| ```bash | |
| # Clone the repository | |
| git clone <repository-url> | |
| cd colpali-vespa-visual-retrieval | |
| # Install dependencies | |
| pip install -e . | |
| # For development | |
| pip install -e ".[dev]" | |
| # For document feeding capabilities | |
| pip install -e ".[feed]" | |
| ``` | |
| ### 2. Environment Configuration | |
| Create a `.env` file with your configuration: | |
| ```bash | |
| # Vespa Configuration | |
| VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com | |
| VESPA_CLOUD_SECRET_TOKEN=your_secret_token | |
| # Alternative: mTLS Authentication | |
| USE_MTLS=false | |
| VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com | |
| VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..." | |
| VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..." | |
| # Optional: Gemini AI (for chat features) | |
| GEMINI_API_KEY=your_gemini_api_key | |
| # Optional: Logging | |
| LOG_LEVEL=INFO | |
| HOT_RELOAD=false | |
| ``` | |
| ### 3. Deploy Vespa Application | |
| ```bash | |
| # Deploy the Vespa schema and configuration | |
| python deploy_vespa_app.py \ | |
| --tenant_name your_tenant \ | |
| --vespa_application_name colpalidemo \ | |
| --token_id_write colpalidemo_write \ | |
| --token_id_read colpalidemo_read | |
| ``` | |
| ### 4. Run the Application | |
| ```bash | |
| python main.py | |
| ``` | |
| The application will be available at `http://localhost:7860` | |
| ## π Document Management | |
| ### Uploading Documents | |
| Use the feeding script to process and upload PDF documents: | |
| ```bash | |
| python feed_vespa.py \ | |
| --application_name colpalidemo \ | |
| --vespa_schema_name pdf_page | |
| ``` | |
| **Document Processing Pipeline (LOCAL β REMOTE):** | |
| 1. **PDF Download** (LOCAL): Your computer downloads PDFs from URLs | |
| 2. **PDF Conversion** (LOCAL): PDFs converted to images (one per page) | |
| 3. **ColPali Processing** (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU | |
| 4. **Embedding Generation** (LOCAL): Visual embeddings created (1024 patches Γ 128 dimensions) | |
| 5. **Binary Encoding** (LOCAL): Embeddings converted to efficient binary format | |
| 6. **Vespa Upload** (REMOTE): Binary embeddings uploaded to Vespa Cloud | |
| 7. **Search Indexing** (REMOTE): Vespa indexes embeddings for fast retrieval | |
| **β οΈ Important Notes:** | |
| - **Processing Time**: Expect 5-30 seconds per page depending on your hardware | |
| - **Network Usage**: Only final embeddings uploaded (~1KB per page vs ~1MB original) | |
| - **Privacy**: Original PDFs and images stay on your local machine | |
| - **Storage**: Raw images cached locally for similarity map generation | |
| ### Supported Operations | |
| - β **Upload Documents**: Add new PDFs to the system | |
| - β **Search Documents**: Query existing documents | |
| - β **View Documents**: Browse stored documents | |
| - β **Remove Documents**: _Not currently implemented_ | |
| - β **Update Documents**: _Not currently implemented_ | |
| ## π Authentication & Security | |
| ### π‘οΈ **Current Security Implementation** | |
| #### **SECURE Components:** | |
| **Vespa Authentication (REMOTE)** | |
| - **Token Authentication**: Bearer tokens for Vespa Cloud API access | |
| - **mTLS Certificates**: Mutual TLS for enterprise security | |
| - **Encrypted Communication**: HTTPS/TLS for all Vespa connections | |
| **API Key Management (LOCAL)** | |
| - **Environment Variables**: Sensitive keys stored in `.env` files | |
| - **API Key Rotation**: Google Gemini supports key rotation | |
| - **Local Storage**: Keys never transmitted except to authorized APIs | |
| #### **LIMITED Security Components:** | |
| **Session Management** | |
| ```python | |
| # Basic UUID session tracking (FastHTML) | |
| session["session_id"] = str(uuid.uuid4()) | |
| # HTTP-only cookies (Next.js) | |
| cookieStore.set(SESSION_KEY, newSessionId, { | |
| httpOnly: true, | |
| secure: process.env.NODE_ENV === "production", | |
| sameSite: "lax", | |
| maxAge: 60 * 60 * 24 * 30, // 30 days | |
| }); | |
| ``` | |
| **Basic Request Validation** | |
| ```python | |
| # HTMX request validation | |
| if "hx-request" not in request.headers: | |
| return RedirectResponse("/search") | |
| # Parameter validation | |
| if not query: | |
| return NextResponse.json({ error: "Query is required" }, { status: 400 }); | |
| ``` | |
| ### β οΈ **Security Limitations & Risks** | |
| #### **MISSING Security Features:** | |
| **β No API Authentication** | |
| - Local API endpoints are **completely open** | |
| - No rate limiting or abuse protection | |
| - No user authentication or authorization | |
| - Anyone can access `/fetch_results`, `/get_sim_map` endpoints | |
| **β No Input Sanitization** | |
| ```python | |
| # Raw user input passed directly to models | |
| query = searchParams.get("query") # No validation/sanitization | |
| ranking = searchParams.get("ranking") # No input filtering | |
| ``` | |
| **β No Security Headers** | |
| - No CORS configuration | |
| - No Content Security Policy (CSP) | |
| - No X-Frame-Options protection | |
| - No X-Content-Type-Options validation | |
| **β No Rate Limiting** | |
| - Unlimited API requests | |
| - No protection against DoS attacks | |
| - No query throttling or user limits | |
| **β No CSRF Protection** | |
| - No token validation for state-changing operations | |
| - Cross-site request forgery possible | |
| ### π― **Security Recommendations** | |
| #### **IMMEDIATE (High Priority)** | |
| **1. Add API Authentication** | |
| ```typescript | |
| // middleware.ts - Add API key validation | |
| export function middleware(request: NextRequest) { | |
| const apiKey = request.headers.get("X-API-Key"); | |
| if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) { | |
| return new Response("Unauthorized", { status: 401 }); | |
| } | |
| } | |
| ``` | |
| **2. Implement Rate Limiting** | |
| ```typescript | |
| // Use next-rate-limit or similar | |
| import rateLimit from "@/lib/rate-limit"; | |
| const limiter = rateLimit({ | |
| interval: 60 * 1000, // 1 minute | |
| uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval | |
| }); | |
| await limiter.check(10, getClientIP(request)); // 10 requests per minute | |
| ``` | |
| **3. Add Security Headers** | |
| ```typescript | |
| // next.config.js | |
| const securityHeaders = [ | |
| { key: "X-Frame-Options", value: "DENY" }, | |
| { key: "X-Content-Type-Options", value: "nosniff" }, | |
| { key: "Referrer-Policy", value: "strict-origin-when-cross-origin" }, | |
| { | |
| key: "Content-Security-Policy", | |
| value: "default-src 'self'; script-src 'self' 'unsafe-inline'", | |
| }, | |
| ]; | |
| ``` | |
| **4. Input Validation & Sanitization** | |
| ```typescript | |
| import { z } from "zod"; | |
| const SearchSchema = z.object({ | |
| query: z | |
| .string() | |
| .min(1) | |
| .max(500) | |
| .regex(/^[a-zA-Z0-9\s\.\?\!]*$/), | |
| ranking: z.enum(["hybrid", "colpali", "bm25"]), | |
| }); | |
| ``` | |
| #### **MEDIUM Priority** | |
| **5. CORS Configuration** | |
| ```typescript | |
| // Restrict origins to known domains | |
| const corsHeaders = { | |
| "Access-Control-Allow-Origin": "https://yourdomain.com", | |
| "Access-Control-Allow-Methods": "GET, POST, OPTIONS", | |
| "Access-Control-Allow-Headers": "Content-Type, Authorization", | |
| }; | |
| ``` | |
| **6. Request Size Limits** | |
| ```typescript | |
| // Limit request payload sizes | |
| export const config = { | |
| api: { | |
| bodyParser: { | |
| sizeLimit: "1mb", | |
| }, | |
| }, | |
| }; | |
| ``` | |
| **7. Audit Logging** | |
| ```python | |
| # Log all API access with IP, timestamp, and queries | |
| logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}") | |
| ``` | |
| #### **LONG-TERM (Production Ready)** | |
| **8. User Authentication (Optional)** | |
| ```typescript | |
| // Add NextAuth.js or similar for user accounts | |
| // Implement role-based access control | |
| // Add document ownership and permissions | |
| ``` | |
| **9. Network Security** | |
| ```bash | |
| # Deploy behind reverse proxy (nginx/cloudflare) | |
| # Enable DDoS protection | |
| # Use Web Application Firewall (WAF) | |
| ``` | |
| **10. Data Privacy Controls** | |
| ```typescript | |
| // Implement data retention policies | |
| // Add user data deletion capabilities | |
| // GDPR compliance features | |
| ``` | |
| ### π **Security Best Practices** | |
| #### **For LOCAL Development:** | |
| - **Never commit API keys** to version control | |
| - **Use strong environment variable names** (avoid `API_KEY`) | |
| - **Rotate API keys regularly** (monthly) | |
| - **Enable firewall** on development machines | |
| - **Use HTTPS even locally** for production testing | |
| #### **For PRODUCTION Deployment:** | |
| - **Deploy behind CDN/WAF** (Cloudflare, AWS Shield) | |
| - **Enable rate limiting** at infrastructure level | |
| - **Use container security scanning** | |
| - **Implement monitoring and alerting** | |
| - **Regular security audits and penetration testing** | |
| #### **For REMOTE Services:** | |
| - **Vespa Cloud**: Follows enterprise security standards | |
| - **Gemini API**: Google-managed security and compliance | |
| - **Environment Isolation**: Separate dev/staging/prod credentials | |
| ### π¨ **Current Risk Level: MEDIUM** | |
| **Suitable for:** | |
| - β **Personal projects and demos** | |
| - β **Internal company tools** (behind firewall) | |
| - β **Research and development** environments | |
| **NOT suitable for:** | |
| - β **Public internet deployment** | |
| - β **Customer-facing applications** | |
| - β **Production environments** with sensitive data | |
| - β **Commercial applications** without security hardening | |
| ## π― Usage Guide | |
| ### Basic Search | |
| 1. Navigate to the homepage | |
| 2. Enter your search query in natural language | |
| 3. Select ranking method (hybrid, semantic, etc.) | |
| 4. View results with similarity maps | |
| ### Similarity Maps | |
| - Click on token buttons to see which parts of documents match specific query terms | |
| - Visual heatmaps show attention patterns | |
| - Reset button returns to original document view | |
| ### AI Chat | |
| - Ask questions about retrieved documents | |
| - Chat responses are based on document content | |
| - Streaming responses for real-time interaction | |
| ### Search Rankings | |
| - **Hybrid**: Combines multiple ranking signals | |
| - **Semantic**: Pure semantic similarity | |
| - **BM25**: Traditional text-based ranking | |
| - **ColPali**: Visual-first ranking | |
| ## π οΈ Development | |
| ### Project Structure | |
| ``` | |
| βββ main.py # Application entry point | |
| βββ backend/ | |
| β βββ colpali.py # ColPali model integration | |
| β βββ vespa_app.py # Vespa client and queries | |
| β βββ modelmanager.py # Model management utilities | |
| βββ frontend/ | |
| β βββ app.py # UI components | |
| β βββ layout.py # Layout templates | |
| βββ feed_vespa.py # Document upload script | |
| βββ deploy_vespa_app.py # Vespa deployment script | |
| βββ colpali-with-snippets/ # Vespa schema definitions | |
| βββ static/ # Static assets and generated files | |
| ``` | |
| ### Running in Development | |
| ```bash | |
| # Enable hot reload | |
| export HOT_RELOAD=true | |
| python main.py | |
| # Or set in .env | |
| echo "HOT_RELOAD=true" >> .env | |
| ``` | |
| ### Code Quality | |
| ```bash | |
| # Format code | |
| ruff format . | |
| # Lint code | |
| ruff check . | |
| ``` | |
| ## π API Endpoints | |
| ### **Current API Routes (β οΈ UNSECURED)** | |
| | Endpoint | Method | Description | Security Status | | |
| | ---------------- | ------ | ----------------------- | ---------------- | | |
| | `/` | GET | Homepage | β Public (safe) | | |
| | `/search` | GET | Search interface | β Public (safe) | | |
| | `/fetch_results` | GET | Fetch search results | β οΈ **OPEN API** | | |
| | `/get_sim_map` | GET | Get similarity maps | β οΈ **OPEN API** | | |
| | `/get-message` | GET | Chat with AI (SSE) | β οΈ **OPEN API** | | |
| | `/full_image` | GET | Get full document image | β οΈ **OPEN API** | | |
| | `/suggestions` | GET | Query autocomplete | β οΈ **OPEN API** | | |
| | `/static/*` | GET | Static file serving | β Public (safe) | | |
| ### **Security Analysis by Endpoint** | |
| #### **π SECURE Endpoints** | |
| - **`/`** and **`/search`**: Static HTML pages, no sensitive data | |
| - **`/static/*`**: Public assets (CSS, JS, images) | |
| #### **β οΈ UNSECURED Endpoints (Risk)** | |
| **`/fetch_results`** - **HIGH RISK** | |
| ```bash | |
| # Anyone can perform unlimited searches | |
| curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid" | |
| ``` | |
| - **Risks**: Resource abuse, server overload, competitive intelligence gathering | |
| - **Exposes**: Search capabilities, document metadata, processing times | |
| **`/get_sim_map`** - **MEDIUM RISK** | |
| ```bash | |
| # Access similarity maps without authentication | |
| curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5" | |
| ``` | |
| - **Risks**: Unauthorized access to visual analysis | |
| - **Exposes**: Document visual patterns, query insights | |
| **`/get-message`** - **HIGH RISK** | |
| ```bash | |
| # Trigger AI processing without limits | |
| curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2" | |
| ``` | |
| - **Risks**: Gemini API abuse, cost exploitation, resource exhaustion | |
| - **Exposes**: AI-generated insights, document content analysis | |
| **`/full_image`** - **HIGH RISK** | |
| ```bash | |
| # Download any document image | |
| curl "http://localhost:7860/full_image?doc_id=any_document_id" | |
| ``` | |
| - **Risks**: Unauthorized document access, data leakage | |
| - **Exposes**: Full document images, potentially sensitive content | |
| ### **Immediate Security Fixes** | |
| #### **1. Add API Key Authentication** | |
| ```python | |
| # Python FastHTML middleware | |
| @app.middleware("http") | |
| async def verify_api_key(request, call_next): | |
| if request.url.path.startswith("/fetch_results"): | |
| api_key = request.headers.get("X-API-Key") | |
| if not api_key or api_key != os.getenv("COLPALI_API_KEY"): | |
| return JSONResponse({"error": "Unauthorized"}, status_code=401) | |
| return await call_next(request) | |
| ``` | |
| #### **2. Implement Rate Limiting** | |
| ```python | |
| from slowapi import Limiter, _rate_limit_exceeded_handler | |
| from slowapi.util import get_remote_address | |
| limiter = Limiter(key_func=get_remote_address) | |
| @rt("/fetch_results") | |
| @limiter.limit("10/minute") # 10 requests per minute per IP | |
| async def get_results(request, query: str, ranking: str): | |
| # ... existing code | |
| ``` | |
| #### **3. Input Validation** | |
| ```python | |
| from pydantic import BaseModel, validator | |
| class SearchRequest(BaseModel): | |
| query: str | |
| ranking: str | |
| @validator('query') | |
| def query_must_be_safe(cls, v): | |
| if len(v) > 500: | |
| raise ValueError('Query too long') | |
| # Add sanitization logic | |
| return v.strip() | |
| ``` | |
| #### **4. Request Origin Validation** | |
| ```python | |
| ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"] | |
| @app.middleware("http") | |
| async def cors_middleware(request, call_next): | |
| origin = request.headers.get("origin") | |
| if origin not in ALLOWED_ORIGINS: | |
| return JSONResponse({"error": "Forbidden"}, status_code=403) | |
| return await call_next(request) | |
| ``` | |
| ### **π Recommended API Security Architecture** | |
| ``` | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β Frontend β β Rate Limiter β β Backend API β | |
| β β β β β β | |
| β β’ API Key βββββΊβ β’ IP Limiting βββββΊβ β’ Input Valid. β | |
| β β’ CORS Headers β β β’ User Quotas β β β’ Auth Checks β | |
| β β’ Request Valid.β β β’ DoS Protectionβ β β’ Audit Logs β | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| ``` | |
| **Benefits:** | |
| - **Layer 1**: Frontend validates requests before sending | |
| - **Layer 2**: Rate limiter prevents abuse and DoS attacks | |
| - **Layer 3**: Backend performs final validation and authorization | |
| ### **π Security Implementation Checklist** | |
| #### **Before Production Deployment:** | |
| **CRITICAL (Must Do):** | |
| - [ ] **Generate API Key**: Create strong API key for endpoint authentication | |
| - [ ] **Enable Rate Limiting**: Implement per-IP request limits | |
| - [ ] **Add Security Headers**: X-Frame-Options, CSP, X-Content-Type-Options | |
| - [ ] **Input Validation**: Sanitize all user inputs (query, ranking) | |
| - [ ] **CORS Configuration**: Restrict origins to known domains only | |
| - [ ] **Environment Security**: Never commit API keys, use secure .env | |
| - [ ] **HTTPS Only**: Force TLS in production (no HTTP) | |
| **HIGH Priority:** | |
| - [ ] **Audit Logging**: Log all API requests with IP and timestamp | |
| - [ ] **Request Size Limits**: Prevent large payload attacks | |
| - [ ] **Error Handling**: Don't expose stack traces or internal details | |
| - [ ] **Session Security**: HTTP-only, secure, SameSite cookies | |
| - [ ] **API Documentation**: Document authentication requirements | |
| **MEDIUM Priority:** | |
| - [ ] **User Authentication**: Consider adding user accounts for access control | |
| - [ ] **Request Timeout**: Prevent long-running request abuse | |
| - [ ] **Content Validation**: Verify response content types | |
| - [ ] **Monitoring**: Set up alerts for unusual API usage patterns | |
| - [ ] **Backup Strategy**: Secure backup of environment variables | |
| #### **Security Testing Commands:** | |
| **Test API Authentication:** | |
| ```bash | |
| # Should fail without API key | |
| curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid" | |
| # Should succeed with API key | |
| curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid" | |
| ``` | |
| **Test Rate Limiting:** | |
| ```bash | |
| # Run multiple requests to trigger rate limit | |
| for i in {1..15}; do | |
| curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid" | |
| echo "Request $i" | |
| done | |
| ``` | |
| **Test Input Validation:** | |
| ```bash | |
| # Should reject invalid/malicious inputs | |
| curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid" | |
| ``` | |
| **Test Security Headers:** | |
| ```bash | |
| # Check security headers in response | |
| curl -I "http://localhost:7860/" | |
| # Should see: X-Frame-Options, X-Content-Type-Options, etc. | |
| ``` | |
| #### **Security Monitoring:** | |
| **Log Analysis Queries:** | |
| ```bash | |
| # Monitor API usage patterns | |
| grep "API_ACCESS" /var/log/colpali.log | tail -100 | |
| # Detect potential abuse | |
| grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log | |
| # Check authentication failures | |
| grep "UNAUTHORIZED" /var/log/colpali.log | |
| ``` | |
| **Alerting Setup:** | |
| - **Rate Limit Violations**: Alert when >50 requests/minute from single IP | |
| - **Authentication Failures**: Alert on repeated unauthorized attempts | |
| - **Unusual Queries**: Alert on suspicious query patterns or injection attempts | |
| - **Resource Usage**: Alert on high CPU/memory usage (potential DoS) | |
| ## π§ͺ Models Used | |
| - **ColPali v1.2**: Visual document understanding | |
| - **ColPaliGemma 3B**: Base visual-language model | |
| - **Google Gemini 2.0**: AI chat and question answering | |
| ## π§ Configuration Options | |
| ### Environment Variables | |
| | Variable | Required | Description | Security Impact | | |
| | -------------------------- | -------- | ------------------------------------------- | ----------------------------------- | | |
| | `VESPA_APP_TOKEN_URL` | Yes\* | Vespa application URL (token auth) | **HIGH** - Remote access | | |
| | `VESPA_CLOUD_SECRET_TOKEN` | Yes\* | Vespa secret token | **CRITICAL** - Full database access | | |
| | `USE_MTLS` | No | Use mTLS instead of token auth | **MEDIUM** - Auth method | | |
| | `VESPA_APP_MTLS_URL` | Yes\*\* | Vespa application URL (mTLS) | **HIGH** - Remote access | | |
| | `VESPA_CLOUD_MTLS_KEY` | Yes\*\* | mTLS private key | **CRITICAL** - TLS credentials | | |
| | `VESPA_CLOUD_MTLS_CERT` | Yes\*\* | mTLS certificate | **HIGH** - TLS credentials | | |
| | `GEMINI_API_KEY` | No | Google Gemini API key | **HIGH** - AI access/costs | | |
| | `LOG_LEVEL` | No | Logging level (DEBUG, INFO, WARNING, ERROR) | **LOW** - Debug info | | |
| | `HOT_RELOAD` | No | Enable hot reload in development | **LOW** - Dev convenience | | |
| #### **π Security-Related Environment Variables (Recommended)** | |
| | Variable | Required | Description | Default | | |
| | -------------------------- | --------- | ------------------------------------ | ------- | | |
| | `COLPALI_API_KEY` | **YES\*** | API key for endpoint authentication | None | | |
| | `ALLOWED_ORIGINS` | **YES\*** | Comma-separated allowed CORS origins | None | | |
| | `RATE_LIMIT_REQUESTS` | No | Max requests per minute per IP | `10` | | |
| | `RATE_LIMIT_WINDOW` | No | Rate limit window in seconds | `60` | | |
| | `MAX_QUERY_LENGTH` | No | Maximum query string length | `500` | | |
| | `ENABLE_AUDIT_LOGGING` | No | Log all API requests for security | `false` | | |
| | `SECURITY_HEADERS_ENABLED` | No | Enable security headers | `true` | | |
| | `CSRF_SECRET` | **YES\*** | Secret for CSRF token generation | None | | |
| **Example Security-Enhanced `.env`:** | |
| ```bash | |
| # Existing configuration | |
| VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com | |
| VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token | |
| GEMINI_API_KEY=your_gemini_api_key | |
| # NEW: Security configuration | |
| COLPALI_API_KEY=your_strong_random_api_key_here | |
| ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com | |
| RATE_LIMIT_REQUESTS=10 | |
| RATE_LIMIT_WINDOW=60 | |
| MAX_QUERY_LENGTH=500 | |
| ENABLE_AUDIT_LOGGING=true | |
| SECURITY_HEADERS_ENABLED=true | |
| CSRF_SECRET=your_random_csrf_secret_here | |
| # Development vs Production | |
| NODE_ENV=production # Enable secure cookies | |
| LOG_LEVEL=INFO # Don't expose debug info in production | |
| ``` | |
| \*Required for token authentication | |
| \*\*Required for mTLS authentication | |
| \*\*\*Required for production security | |
| ## π¨ Troubleshooting | |
| ### **LOCAL Processing Issues** | |
| **ColPali model fails to load:** | |
| ```bash | |
| # Check GPU memory | |
| nvidia-smi # For NVIDIA GPUs | |
| # or | |
| system_profiler SPDisplaysDataType # For Apple Silicon | |
| # Clear model cache if corrupted | |
| rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2 | |
| ``` | |
| **Out of memory errors:** | |
| - Reduce batch size in `feed_vespa.py` (try `batch_size=1`) | |
| - Close other applications to free RAM/VRAM | |
| - Use CPU processing if GPU memory insufficient: `CUDA_VISIBLE_DEVICES="" python main.py` | |
| **Slow processing on CPU:** | |
| - Expected behavior - ColPali requires significant computation | |
| - Consider upgrading to GPU or Apple Silicon for 5-10x speedup | |
| - Process documents overnight for large collections | |
| ### **REMOTE Processing Issues** | |
| **Connection to Vespa fails:** | |
| - Verify your Vespa URL and credentials in `.env` | |
| - Check if the Vespa application is deployed and running | |
| - Ensure network connectivity: `ping your-app.vespa-cloud.com` | |
| - Validate authentication tokens haven't expired | |
| **Document upload fails:** | |
| - Check Vespa Cloud storage quota and billing | |
| - Verify embedding format matches Vespa schema | |
| - Ensure stable internet connection for large uploads | |
| **Search returns no results:** | |
| - Confirm documents were successfully uploaded to Vespa | |
| - Check if embeddings were properly indexed | |
| - Verify query processing isn't failing locally | |
| ### **MIXED (Local + Remote) Issues** | |
| **Chat features don't work:** | |
| - **LOCAL**: Verify document images are being generated locally | |
| - **REMOTE**: Check `GEMINI_API_KEY` is set correctly | |
| - **REMOTE**: Verify Gemini API quota and billing | |
| - **NETWORK**: Ensure images can be sent to Gemini API | |
| **Similarity maps missing:** | |
| - **LOCAL**: Confirm ColPali model loaded successfully | |
| - **LOCAL**: Check if similarity map generation completed | |
| - **REMOTE**: Verify Vespa returned similarity data | |
| - **BROWSER**: Clear browser cache for static files | |
| ### Performance Tips | |
| **LOCAL Optimization:** | |
| - Use GPU acceleration for 5-10x faster model inference | |
| - Optimize batch sizes based on available memory | |
| - Use SSD storage for faster model loading | |
| - Consider quantized models for lower memory usage | |
| **REMOTE Optimization:** | |
| - Use Vespa's HNSW indexing for faster search | |
| - Optimize embedding dimensions vs accuracy tradeoff | |
| - Enable compression for faster network transfer | |
| - Use multiple Vespa instances for high availability | |
| **NETWORK Optimization:** | |
| - Process documents in batches to reduce upload overhead | |
| - Use compression for embedding transfer | |
| - Consider regional Vespa deployment for lower latency | |
| ## π License | |
| Apache-2.0 | |
| ## π€ Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Run tests and linting | |
| 5. Submit a pull request | |
| ## π Support | |
| For issues and questions: | |
| - Check the troubleshooting section | |
| - Review Vespa and ColPali documentation | |
| - Open an issue on the repository | |