Spaces:

MHamdan
/

SPARKNET

Sleeping

App Files Files Community

SPARKNET / demo /README.md

MHamdan

Initial commit: SPARKNET framework

d520909 25 days ago

preview code

raw

history blame contribute delete

4.75 kB

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

SPARKNET Demo Application

An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.

Features

📄 Document Processing: Upload and process documents with OCR
🔍 Field Extraction: Extract structured data with evidence grounding
💬 RAG Q&A: Interactive question answering with citations
🏷️ Classification: Automatic document type detection
📊 Analytics: Processing statistics and insights
🔬 Live Processing: Real-time pipeline visualization
📊 Document Comparison: Compare multiple documents

Quick Start

1. Install Dependencies

# From project root
pip install -r demo/requirements.txt

# Or install all SPARKNET dependencies
pip install -r requirements.txt

2. Start Ollama (Optional, for live processing)

ollama serve

# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text

3. Run the Demo

# From project root
streamlit run demo/app.py

# Or with custom port
streamlit run demo/app.py --server.port 8501

4. Open in Browser

Navigate to http://localhost:8501

Demo Pages

Page	Description
Home	Overview and feature cards
Document Processing	Upload/select documents for OCR processing
Field Extraction	Extract structured fields with evidence
RAG Q&A	Ask questions about indexed documents
Classification	Classify document types
Analytics	View processing statistics
Live Processing	Watch pipeline in real-time
Interactive RAG	Chat-style document Q&A
Document Comparison	Compare documents side by side

Sample Documents

The demo uses patent pledge documents from the Dataset/ folder:

Apple 11.11.2011.pdf
IBM 11.01.2005.pdf
Google 08.02.2012.pdf
And more...

Screenshots

Home Page

┌─────────────────────────────────────────┐
│  🔥 SPARKNET                            │
│  Agentic Document Intelligence Platform │
├─────────────────────────────────────────┤
│  [Doc Processing] [Extraction] [RAG]    │
│                                         │
│  Feature cards with gradients...        │
└─────────────────────────────────────────┘

RAG Q&A

┌─────────────────────────────────────────┐
│  💬 Ask a question...                   │
├─────────────────────────────────────────┤
│  User: What patents are covered?        │
│                                         │
│  Assistant: Based on the documents...   │
│  [📚 View Sources]                      │
│    [1] Apple - Page 1: "..."            │
│    [2] IBM - Page 2: "..."              │
└─────────────────────────────────────────┘

Configuration

Environment Variables

# Ollama URL (default: http://localhost:11434)
export OLLAMA_BASE_URL=http://localhost:11434

# ChromaDB path (default: ./data/vectorstore)
export CHROMA_PERSIST_DIR=./data/vectorstore

Streamlit Config

Create .streamlit/config.toml:

[theme]
primaryColor = "#FF6B6B"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"

[server]
maxUploadSize = 50

Development

Adding New Pages

Create a new file in demo/pages/:
```
demo/pages/4_🆕_New_Feature.py
```
Follow the naming convention: {order}_{emoji}_{name}.py

Import project modules:

import sys
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))

Customizing Styles

Edit the CSS in app.py:

st.markdown("""
<style>
    .main-header { ... }
    .evidence-box { ... }
</style>
""", unsafe_allow_html=True)

Troubleshooting

"ModuleNotFoundError: No module named 'src'"

Make sure you're running from the project root:

cd /path/to/SPARKNET
streamlit run demo/app.py

Ollama Not Connected

Check if Ollama is running: curl http://localhost:11434/api/tags
Start Ollama: ollama serve

ChromaDB Errors

Install ChromaDB:

pip install chromadb

License

Part of the SPARKNET project. See main LICENSE file.