| # SPARKNET Demo Application | |
| An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities. | |
| ## Features | |
| - **π Document Processing**: Upload and process documents with OCR | |
| - **π Field Extraction**: Extract structured data with evidence grounding | |
| - **π¬ RAG Q&A**: Interactive question answering with citations | |
| - **π·οΈ Classification**: Automatic document type detection | |
| - **π Analytics**: Processing statistics and insights | |
| - **π¬ Live Processing**: Real-time pipeline visualization | |
| - **π Document Comparison**: Compare multiple documents | |
| ## Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| # From project root | |
| pip install -r demo/requirements.txt | |
| # Or install all SPARKNET dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Start Ollama (Optional, for live processing) | |
| ```bash | |
| ollama serve | |
| # Pull required models | |
| ollama pull llama3.2:3b | |
| ollama pull nomic-embed-text | |
| ``` | |
| ### 3. Run the Demo | |
| ```bash | |
| # From project root | |
| streamlit run demo/app.py | |
| # Or with custom port | |
| streamlit run demo/app.py --server.port 8501 | |
| ``` | |
| ### 4. Open in Browser | |
| Navigate to http://localhost:8501 | |
| ## Demo Pages | |
| | Page | Description | | |
| |------|-------------| | |
| | **Home** | Overview and feature cards | | |
| | **Document Processing** | Upload/select documents for OCR processing | | |
| | **Field Extraction** | Extract structured fields with evidence | | |
| | **RAG Q&A** | Ask questions about indexed documents | | |
| | **Classification** | Classify document types | | |
| | **Analytics** | View processing statistics | | |
| | **Live Processing** | Watch pipeline in real-time | | |
| | **Interactive RAG** | Chat-style document Q&A | | |
| | **Document Comparison** | Compare documents side by side | | |
| ## Sample Documents | |
| The demo uses patent pledge documents from the `Dataset/` folder: | |
| - Apple 11.11.2011.pdf | |
| - IBM 11.01.2005.pdf | |
| - Google 08.02.2012.pdf | |
| - And more... | |
| ## Screenshots | |
| ### Home Page | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββ | |
| β π₯ SPARKNET β | |
| β Agentic Document Intelligence Platform β | |
| βββββββββββββββββββββββββββββββββββββββββββ€ | |
| β [Doc Processing] [Extraction] [RAG] β | |
| β β | |
| β Feature cards with gradients... β | |
| βββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### RAG Q&A | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββ | |
| β π¬ Ask a question... β | |
| βββββββββββββββββββββββββββββββββββββββββββ€ | |
| β User: What patents are covered? β | |
| β β | |
| β Assistant: Based on the documents... β | |
| β [π View Sources] β | |
| β [1] Apple - Page 1: "..." β | |
| β [2] IBM - Page 2: "..." β | |
| βββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Configuration | |
| ### Environment Variables | |
| ```bash | |
| # Ollama URL (default: http://localhost:11434) | |
| export OLLAMA_BASE_URL=http://localhost:11434 | |
| # ChromaDB path (default: ./data/vectorstore) | |
| export CHROMA_PERSIST_DIR=./data/vectorstore | |
| ``` | |
| ### Streamlit Config | |
| Create `.streamlit/config.toml`: | |
| ```toml | |
| [theme] | |
| primaryColor = "#FF6B6B" | |
| backgroundColor = "#FFFFFF" | |
| secondaryBackgroundColor = "#F0F2F6" | |
| textColor = "#262730" | |
| [server] | |
| maxUploadSize = 50 | |
| ``` | |
| ## Development | |
| ### Adding New Pages | |
| 1. Create a new file in `demo/pages/`: | |
| ``` | |
| demo/pages/4_π_New_Feature.py | |
| ``` | |
| 2. Follow the naming convention: `{order}_{emoji}_{name}.py` | |
| 3. Import project modules: | |
| ```python | |
| import sys | |
| from pathlib import Path | |
| PROJECT_ROOT = Path(__file__).parent.parent.parent | |
| sys.path.insert(0, str(PROJECT_ROOT)) | |
| ``` | |
| ### Customizing Styles | |
| Edit the CSS in `app.py`: | |
| ```python | |
| st.markdown(""" | |
| <style> | |
| .main-header { ... } | |
| .evidence-box { ... } | |
| </style> | |
| """, unsafe_allow_html=True) | |
| ``` | |
| ## Troubleshooting | |
| ### "ModuleNotFoundError: No module named 'src'" | |
| Make sure you're running from the project root: | |
| ```bash | |
| cd /path/to/SPARKNET | |
| streamlit run demo/app.py | |
| ``` | |
| ### Ollama Not Connected | |
| 1. Check if Ollama is running: `curl http://localhost:11434/api/tags` | |
| 2. Start Ollama: `ollama serve` | |
| ### ChromaDB Errors | |
| Install ChromaDB: | |
| ```bash | |
| pip install chromadb | |
| ``` | |
| ## License | |
| Part of the SPARKNET project. See main LICENSE file. | |