SPARKNET / demo /README.md
MHamdan's picture
Initial commit: SPARKNET framework
d520909
# SPARKNET Demo Application
An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.
## Features
- **πŸ“„ Document Processing**: Upload and process documents with OCR
- **πŸ” Field Extraction**: Extract structured data with evidence grounding
- **πŸ’¬ RAG Q&A**: Interactive question answering with citations
- **🏷️ Classification**: Automatic document type detection
- **πŸ“Š Analytics**: Processing statistics and insights
- **πŸ”¬ Live Processing**: Real-time pipeline visualization
- **πŸ“Š Document Comparison**: Compare multiple documents
## Quick Start
### 1. Install Dependencies
```bash
# From project root
pip install -r demo/requirements.txt
# Or install all SPARKNET dependencies
pip install -r requirements.txt
```
### 2. Start Ollama (Optional, for live processing)
```bash
ollama serve
# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
```
### 3. Run the Demo
```bash
# From project root
streamlit run demo/app.py
# Or with custom port
streamlit run demo/app.py --server.port 8501
```
### 4. Open in Browser
Navigate to http://localhost:8501
## Demo Pages
| Page | Description |
|------|-------------|
| **Home** | Overview and feature cards |
| **Document Processing** | Upload/select documents for OCR processing |
| **Field Extraction** | Extract structured fields with evidence |
| **RAG Q&A** | Ask questions about indexed documents |
| **Classification** | Classify document types |
| **Analytics** | View processing statistics |
| **Live Processing** | Watch pipeline in real-time |
| **Interactive RAG** | Chat-style document Q&A |
| **Document Comparison** | Compare documents side by side |
## Sample Documents
The demo uses patent pledge documents from the `Dataset/` folder:
- Apple 11.11.2011.pdf
- IBM 11.01.2005.pdf
- Google 08.02.2012.pdf
- And more...
## Screenshots
### Home Page
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ”₯ SPARKNET β”‚
β”‚ Agentic Document Intelligence Platform β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [Doc Processing] [Extraction] [RAG] β”‚
β”‚ β”‚
β”‚ Feature cards with gradients... β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### RAG Q&A
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ’¬ Ask a question... β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ User: What patents are covered? β”‚
β”‚ β”‚
β”‚ Assistant: Based on the documents... β”‚
β”‚ [πŸ“š View Sources] β”‚
β”‚ [1] Apple - Page 1: "..." β”‚
β”‚ [2] IBM - Page 2: "..." β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Configuration
### Environment Variables
```bash
# Ollama URL (default: http://localhost:11434)
export OLLAMA_BASE_URL=http://localhost:11434
# ChromaDB path (default: ./data/vectorstore)
export CHROMA_PERSIST_DIR=./data/vectorstore
```
### Streamlit Config
Create `.streamlit/config.toml`:
```toml
[theme]
primaryColor = "#FF6B6B"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"
[server]
maxUploadSize = 50
```
## Development
### Adding New Pages
1. Create a new file in `demo/pages/`:
```
demo/pages/4_πŸ†•_New_Feature.py
```
2. Follow the naming convention: `{order}_{emoji}_{name}.py`
3. Import project modules:
```python
import sys
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))
```
### Customizing Styles
Edit the CSS in `app.py`:
```python
st.markdown("""
<style>
.main-header { ... }
.evidence-box { ... }
</style>
""", unsafe_allow_html=True)
```
## Troubleshooting
### "ModuleNotFoundError: No module named 'src'"
Make sure you're running from the project root:
```bash
cd /path/to/SPARKNET
streamlit run demo/app.py
```
### Ollama Not Connected
1. Check if Ollama is running: `curl http://localhost:11434/api/tags`
2. Start Ollama: `ollama serve`
### ChromaDB Errors
Install ChromaDB:
```bash
pip install chromadb
```
## License
Part of the SPARKNET project. See main LICENSE file.