๏ฟฝ MAJOR: Add RAG System with LangChain - Document upload and retrieval-augmented generation support
fc80207
| # RAG System Setup Guide | |
| ## Overview | |
| The Edge LLM platform now includes a simple RAG (Retrieval-Augmented Generation) system that allows you to upload documents to enhance AI responses with relevant context. | |
| ## Features | |
| - ๐ **Document Upload**: Support for PDF, TXT, DOCX, and MD files | |
| - ๐ **Semantic Search**: Find relevant information from your documents | |
| - โ๏ธ **Configurable Retrieval**: Adjust how many document chunks to use for context | |
| - ๐ฏ **Easy Integration**: Toggle RAG on/off in the Assistant Studio | |
| ## Installation | |
| ### Backend Dependencies | |
| Install the required Python packages: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| The RAG system requires these additional packages: | |
| - `langchain`: LangChain framework | |
| - `pypdf`: PDF processing | |
| - `python-docx`: Word document processing | |
| - `faiss-cpu`: Vector similarity search | |
| - `sentence-transformers`: Text embeddings | |
| - `unstructured`: Document parsing | |
| ### Frontend | |
| No additional frontend dependencies needed. The Documents tab is included in the main build. | |
| ## Usage | |
| ### 1. Access the Documents Tab | |
| 1. Open Assistant Studio | |
| 2. Navigate to the **Documents** tab (next to Parameters and Instructions) | |
| ### 2. Upload Documents | |
| 1. Click "Click to upload documents" in the upload area | |
| 2. Select PDF, TXT, DOCX, or MD files | |
| 3. Files will be processed and chunked automatically | |
| 4. Uploaded documents appear in the "Uploaded Documents" section | |
| ### 3. Configure RAG | |
| 1. **Enable RAG**: Toggle the "Enable RAG" switch (only available when documents are uploaded) | |
| 2. **Retrieval Count**: Adjust the slider to set how many document chunks to retrieve (1-10) | |
| - 1-3: Focused responses with minimal context | |
| - 4-7: Balanced responses with moderate context | |
| - 8-10: Comprehensive responses with extensive context | |
| ### 4. Chat with RAG Enhancement | |
| Once RAG is enabled: | |
| 1. Ask questions normally in the chat | |
| 2. The system will automatically search your uploaded documents | |
| 3. Relevant information will be added to the AI's context | |
| 4. The AI will incorporate document information into responses when relevant | |
| ## API Endpoints | |
| ### Document Management | |
| - `POST /rag/upload` - Upload multiple documents | |
| - `GET /rag/documents` - List uploaded documents | |
| - `DELETE /rag/documents/{doc_id}` - Delete a document | |
| - `POST /rag/search` - Search through documents | |
| ### Enhanced Generation | |
| The existing `/generate` endpoint now supports RAG when: | |
| - Documents are uploaded to the RAG system | |
| - The request includes RAG configuration (handled automatically by frontend) | |
| ## Technical Details | |
| ### Document Processing | |
| 1. Files are uploaded and temporarily stored | |
| 2. LangChain loaders extract text content | |
| 3. Text is split into chunks (1000 chars with 200 char overlap) | |
| 4. Chunks are embedded using `sentence-transformers/all-MiniLM-L6-v2` | |
| 5. Embeddings are stored in FAISS vector database | |
| ### RAG Pipeline | |
| 1. User query is embedded using the same model | |
| 2. Similarity search finds relevant document chunks | |
| 3. Retrieved chunks are added to the system prompt | |
| 4. AI generates response with document context | |
| ## Limitations & Notes | |
| - **Memory Storage**: Documents are stored in memory (not persistent across restarts) | |
| - **CPU Only**: Uses CPU-based embeddings for compatibility | |
| - **File Size**: Large files may take time to process | |
| - **Language**: Optimized for English content | |
| ## Troubleshooting | |
| ### "RAG system not available" Error | |
| - Ensure LangChain dependencies are installed | |
| - Check that `rag_system.py` is in the correct location | |
| - Verify embeddings model downloaded successfully | |
| ### Documents Not Uploading | |
| - Check file format (PDF, TXT, DOCX, MD supported) | |
| - Ensure file size is reasonable (<50MB recommended) | |
| - Check browser console for error messages | |
| ### Poor RAG Performance | |
| - Try adjusting retrieval count | |
| - Ensure documents contain relevant information | |
| - Check that document text was extracted correctly | |
| ## Future Improvements | |
| - Persistent vector storage (ChromaDB, Pinecone) | |
| - GPU acceleration for embeddings | |
| - More document formats (PPT, HTML, etc.) | |
| - Advanced chunking strategies | |
| - Custom embedding models | |
| - Query expansion and reranking | |