File size: 4,750 Bytes
d520909 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# SPARKNET Demo Application
An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.
## Features
- **π Document Processing**: Upload and process documents with OCR
- **π Field Extraction**: Extract structured data with evidence grounding
- **π¬ RAG Q&A**: Interactive question answering with citations
- **π·οΈ Classification**: Automatic document type detection
- **π Analytics**: Processing statistics and insights
- **π¬ Live Processing**: Real-time pipeline visualization
- **π Document Comparison**: Compare multiple documents
## Quick Start
### 1. Install Dependencies
```bash
# From project root
pip install -r demo/requirements.txt
# Or install all SPARKNET dependencies
pip install -r requirements.txt
```
### 2. Start Ollama (Optional, for live processing)
```bash
ollama serve
# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
```
### 3. Run the Demo
```bash
# From project root
streamlit run demo/app.py
# Or with custom port
streamlit run demo/app.py --server.port 8501
```
### 4. Open in Browser
Navigate to http://localhost:8501
## Demo Pages
| Page | Description |
|------|-------------|
| **Home** | Overview and feature cards |
| **Document Processing** | Upload/select documents for OCR processing |
| **Field Extraction** | Extract structured fields with evidence |
| **RAG Q&A** | Ask questions about indexed documents |
| **Classification** | Classify document types |
| **Analytics** | View processing statistics |
| **Live Processing** | Watch pipeline in real-time |
| **Interactive RAG** | Chat-style document Q&A |
| **Document Comparison** | Compare documents side by side |
## Sample Documents
The demo uses patent pledge documents from the `Dataset/` folder:
- Apple 11.11.2011.pdf
- IBM 11.01.2005.pdf
- Google 08.02.2012.pdf
- And more...
## Screenshots
### Home Page
```
βββββββββββββββββββββββββββββββββββββββββββ
β π₯ SPARKNET β
β Agentic Document Intelligence Platform β
βββββββββββββββββββββββββββββββββββββββββββ€
β [Doc Processing] [Extraction] [RAG] β
β β
β Feature cards with gradients... β
βββββββββββββββββββββββββββββββββββββββββββ
```
### RAG Q&A
```
βββββββββββββββββββββββββββββββββββββββββββ
β π¬ Ask a question... β
βββββββββββββββββββββββββββββββββββββββββββ€
β User: What patents are covered? β
β β
β Assistant: Based on the documents... β
β [π View Sources] β
β [1] Apple - Page 1: "..." β
β [2] IBM - Page 2: "..." β
βββββββββββββββββββββββββββββββββββββββββββ
```
## Configuration
### Environment Variables
```bash
# Ollama URL (default: http://localhost:11434)
export OLLAMA_BASE_URL=http://localhost:11434
# ChromaDB path (default: ./data/vectorstore)
export CHROMA_PERSIST_DIR=./data/vectorstore
```
### Streamlit Config
Create `.streamlit/config.toml`:
```toml
[theme]
primaryColor = "#FF6B6B"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"
[server]
maxUploadSize = 50
```
## Development
### Adding New Pages
1. Create a new file in `demo/pages/`:
```
demo/pages/4_π_New_Feature.py
```
2. Follow the naming convention: `{order}_{emoji}_{name}.py`
3. Import project modules:
```python
import sys
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))
```
### Customizing Styles
Edit the CSS in `app.py`:
```python
st.markdown("""
<style>
.main-header { ... }
.evidence-box { ... }
</style>
""", unsafe_allow_html=True)
```
## Troubleshooting
### "ModuleNotFoundError: No module named 'src'"
Make sure you're running from the project root:
```bash
cd /path/to/SPARKNET
streamlit run demo/app.py
```
### Ollama Not Connected
1. Check if Ollama is running: `curl http://localhost:11434/api/tags`
2. Start Ollama: `ollama serve`
### ChromaDB Errors
Install ChromaDB:
```bash
pip install chromadb
```
## License
Part of the SPARKNET project. See main LICENSE file.
|