# SPARKNET Demo Application An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities. ## Features - **📄 Document Processing**: Upload and process documents with OCR - **🔍 Field Extraction**: Extract structured data with evidence grounding - **💬 RAG Q&A**: Interactive question answering with citations - **🏷️ Classification**: Automatic document type detection - **📊 Analytics**: Processing statistics and insights - **🔬 Live Processing**: Real-time pipeline visualization - **📊 Document Comparison**: Compare multiple documents ## Quick Start ### 1. Install Dependencies ```bash # From project root pip install -r demo/requirements.txt # Or install all SPARKNET dependencies pip install -r requirements.txt ``` ### 2. Start Ollama (Optional, for live processing) ```bash ollama serve # Pull required models ollama pull llama3.2:3b ollama pull nomic-embed-text ``` ### 3. Run the Demo ```bash # From project root streamlit run demo/app.py # Or with custom port streamlit run demo/app.py --server.port 8501 ``` ### 4. Open in Browser Navigate to http://localhost:8501 ## Demo Pages | Page | Description | |------|-------------| | **Home** | Overview and feature cards | | **Document Processing** | Upload/select documents for OCR processing | | **Field Extraction** | Extract structured fields with evidence | | **RAG Q&A** | Ask questions about indexed documents | | **Classification** | Classify document types | | **Analytics** | View processing statistics | | **Live Processing** | Watch pipeline in real-time | | **Interactive RAG** | Chat-style document Q&A | | **Document Comparison** | Compare documents side by side | ## Sample Documents The demo uses patent pledge documents from the `Dataset/` folder: - Apple 11.11.2011.pdf - IBM 11.01.2005.pdf - Google 08.02.2012.pdf - And more... ## Screenshots ### Home Page ``` ┌─────────────────────────────────────────┐ │ 🔥 SPARKNET │ │ Agentic Document Intelligence Platform │ ├─────────────────────────────────────────┤ │ [Doc Processing] [Extraction] [RAG] │ │ │ │ Feature cards with gradients... │ └─────────────────────────────────────────┘ ``` ### RAG Q&A ``` ┌─────────────────────────────────────────┐ │ 💬 Ask a question... │ ├─────────────────────────────────────────┤ │ User: What patents are covered? │ │ │ │ Assistant: Based on the documents... │ │ [📚 View Sources] │ │ [1] Apple - Page 1: "..." │ │ [2] IBM - Page 2: "..." │ └─────────────────────────────────────────┘ ``` ## Configuration ### Environment Variables ```bash # Ollama URL (default: http://localhost:11434) export OLLAMA_BASE_URL=http://localhost:11434 # ChromaDB path (default: ./data/vectorstore) export CHROMA_PERSIST_DIR=./data/vectorstore ``` ### Streamlit Config Create `.streamlit/config.toml`: ```toml [theme] primaryColor = "#FF6B6B" backgroundColor = "#FFFFFF" secondaryBackgroundColor = "#F0F2F6" textColor = "#262730" [server] maxUploadSize = 50 ``` ## Development ### Adding New Pages 1. Create a new file in `demo/pages/`: ``` demo/pages/4_🆕_New_Feature.py ``` 2. Follow the naming convention: `{order}_{emoji}_{name}.py` 3. Import project modules: ```python import sys from pathlib import Path PROJECT_ROOT = Path(__file__).parent.parent.parent sys.path.insert(0, str(PROJECT_ROOT)) ``` ### Customizing Styles Edit the CSS in `app.py`: ```python st.markdown(""" """, unsafe_allow_html=True) ``` ## Troubleshooting ### "ModuleNotFoundError: No module named 'src'" Make sure you're running from the project root: ```bash cd /path/to/SPARKNET streamlit run demo/app.py ``` ### Ollama Not Connected 1. Check if Ollama is running: `curl http://localhost:11434/api/tags` 2. Start Ollama: `ollama serve` ### ChromaDB Errors Install ChromaDB: ```bash pip install chromadb ``` ## License Part of the SPARKNET project. See main LICENSE file.