Spaces:

MHamdan
/

SPARKNET

Sleeping

File size: 4,750 Bytes

d520909

# SPARKNET Demo Application

An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.

## Features

- **📄 Document Processing**: Upload and process documents with OCR
- **🔍 Field Extraction**: Extract structured data with evidence grounding
- **💬 RAG Q&A**: Interactive question answering with citations
- **🏷️ Classification**: Automatic document type detection
- **📊 Analytics**: Processing statistics and insights
- **🔬 Live Processing**: Real-time pipeline visualization
- **📊 Document Comparison**: Compare multiple documents

## Quick Start

### 1. Install Dependencies

```bash
# From project root
pip install -r demo/requirements.txt

# Or install all SPARKNET dependencies
pip install -r requirements.txt
```

### 2. Start Ollama (Optional, for live processing)

```bash
ollama serve

# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
```

### 3. Run the Demo

```bash
# From project root
streamlit run demo/app.py

# Or with custom port
streamlit run demo/app.py --server.port 8501
```

### 4. Open in Browser

Navigate to http://localhost:8501

## Demo Pages

| Page | Description |
|------|-------------|
| **Home** | Overview and feature cards |
| **Document Processing** | Upload/select documents for OCR processing |
| **Field Extraction** | Extract structured fields with evidence |
| **RAG Q&A** | Ask questions about indexed documents |
| **Classification** | Classify document types |
| **Analytics** | View processing statistics |
| **Live Processing** | Watch pipeline in real-time |
| **Interactive RAG** | Chat-style document Q&A |
| **Document Comparison** | Compare documents side by side |

## Sample Documents

The demo uses patent pledge documents from the `Dataset/` folder:

- Apple 11.11.2011.pdf
- IBM 11.01.2005.pdf
- Google 08.02.2012.pdf
- And more...

## Screenshots

### Home Page
```
┌─────────────────────────────────────────┐
│  🔥 SPARKNET                            │
│  Agentic Document Intelligence Platform │
├─────────────────────────────────────────┤
│  [Doc Processing] [Extraction] [RAG]    │
│                                         │
│  Feature cards with gradients...        │
└─────────────────────────────────────────┘
```

### RAG Q&A
```
┌─────────────────────────────────────────┐
│  💬 Ask a question...                   │
├─────────────────────────────────────────┤
│  User: What patents are covered?        │
│                                         │
│  Assistant: Based on the documents...   │
│  [📚 View Sources]                      │
│    [1] Apple - Page 1: "..."            │
│    [2] IBM - Page 2: "..."              │
└─────────────────────────────────────────┘
```

## Configuration

### Environment Variables

```bash
# Ollama URL (default: http://localhost:11434)
export OLLAMA_BASE_URL=http://localhost:11434

# ChromaDB path (default: ./data/vectorstore)
export CHROMA_PERSIST_DIR=./data/vectorstore
```

### Streamlit Config

Create `.streamlit/config.toml`:

```toml
[theme]
primaryColor = "#FF6B6B"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"

[server]
maxUploadSize = 50
```

## Development

### Adding New Pages

1. Create a new file in `demo/pages/`:
   ```
   demo/pages/4_🆕_New_Feature.py
   ```

2. Follow the naming convention: `{order}_{emoji}_{name}.py`

3. Import project modules:
   ```python
   import sys
   from pathlib import Path
   PROJECT_ROOT = Path(__file__).parent.parent.parent
   sys.path.insert(0, str(PROJECT_ROOT))
   ```

### Customizing Styles

Edit the CSS in `app.py`:

```python
st.markdown("""
<style>
    .main-header { ... }
    .evidence-box { ... }
</style>
""", unsafe_allow_html=True)
```

## Troubleshooting

### "ModuleNotFoundError: No module named 'src'"

Make sure you're running from the project root:
```bash
cd /path/to/SPARKNET
streamlit run demo/app.py
```

### Ollama Not Connected

1. Check if Ollama is running: `curl http://localhost:11434/api/tags`
2. Start Ollama: `ollama serve`

### ChromaDB Errors

Install ChromaDB:
```bash
pip install chromadb
```

## License

Part of the SPARKNET project. See main LICENSE file.