File size: 4,750 Bytes
d520909
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# SPARKNET Demo Application

An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.

## Features

- **πŸ“„ Document Processing**: Upload and process documents with OCR
- **πŸ” Field Extraction**: Extract structured data with evidence grounding
- **πŸ’¬ RAG Q&A**: Interactive question answering with citations
- **🏷️ Classification**: Automatic document type detection
- **πŸ“Š Analytics**: Processing statistics and insights
- **πŸ”¬ Live Processing**: Real-time pipeline visualization
- **πŸ“Š Document Comparison**: Compare multiple documents

## Quick Start

### 1. Install Dependencies

```bash
# From project root
pip install -r demo/requirements.txt

# Or install all SPARKNET dependencies
pip install -r requirements.txt
```

### 2. Start Ollama (Optional, for live processing)

```bash
ollama serve

# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
```

### 3. Run the Demo

```bash
# From project root
streamlit run demo/app.py

# Or with custom port
streamlit run demo/app.py --server.port 8501
```

### 4. Open in Browser

Navigate to http://localhost:8501

## Demo Pages

| Page | Description |
|------|-------------|
| **Home** | Overview and feature cards |
| **Document Processing** | Upload/select documents for OCR processing |
| **Field Extraction** | Extract structured fields with evidence |
| **RAG Q&A** | Ask questions about indexed documents |
| **Classification** | Classify document types |
| **Analytics** | View processing statistics |
| **Live Processing** | Watch pipeline in real-time |
| **Interactive RAG** | Chat-style document Q&A |
| **Document Comparison** | Compare documents side by side |

## Sample Documents

The demo uses patent pledge documents from the `Dataset/` folder:

- Apple 11.11.2011.pdf
- IBM 11.01.2005.pdf
- Google 08.02.2012.pdf
- And more...

## Screenshots

### Home Page
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ”₯ SPARKNET                            β”‚
β”‚  Agentic Document Intelligence Platform β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [Doc Processing] [Extraction] [RAG]    β”‚
β”‚                                         β”‚
β”‚  Feature cards with gradients...        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### RAG Q&A
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ’¬ Ask a question...                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  User: What patents are covered?        β”‚
β”‚                                         β”‚
β”‚  Assistant: Based on the documents...   β”‚
β”‚  [πŸ“š View Sources]                      β”‚
β”‚    [1] Apple - Page 1: "..."            β”‚
β”‚    [2] IBM - Page 2: "..."              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Configuration

### Environment Variables

```bash
# Ollama URL (default: http://localhost:11434)
export OLLAMA_BASE_URL=http://localhost:11434

# ChromaDB path (default: ./data/vectorstore)
export CHROMA_PERSIST_DIR=./data/vectorstore
```

### Streamlit Config

Create `.streamlit/config.toml`:

```toml
[theme]
primaryColor = "#FF6B6B"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"

[server]
maxUploadSize = 50
```

## Development

### Adding New Pages

1. Create a new file in `demo/pages/`:
   ```
   demo/pages/4_πŸ†•_New_Feature.py
   ```

2. Follow the naming convention: `{order}_{emoji}_{name}.py`

3. Import project modules:
   ```python
   import sys
   from pathlib import Path
   PROJECT_ROOT = Path(__file__).parent.parent.parent
   sys.path.insert(0, str(PROJECT_ROOT))
   ```

### Customizing Styles

Edit the CSS in `app.py`:

```python
st.markdown("""
<style>
    .main-header { ... }
    .evidence-box { ... }
</style>
""", unsafe_allow_html=True)
```

## Troubleshooting

### "ModuleNotFoundError: No module named 'src'"

Make sure you're running from the project root:
```bash
cd /path/to/SPARKNET
streamlit run demo/app.py
```

### Ollama Not Connected

1. Check if Ollama is running: `curl http://localhost:11434/api/tags`
2. Start Ollama: `ollama serve`

### ChromaDB Errors

Install ChromaDB:
```bash
pip install chromadb
```

## License

Part of the SPARKNET project. See main LICENSE file.