SPARKNET / demo /README.md
MHamdan's picture
Initial commit: SPARKNET framework
d520909

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

SPARKNET Demo Application

An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.

Features

  • πŸ“„ Document Processing: Upload and process documents with OCR
  • πŸ” Field Extraction: Extract structured data with evidence grounding
  • πŸ’¬ RAG Q&A: Interactive question answering with citations
  • 🏷️ Classification: Automatic document type detection
  • πŸ“Š Analytics: Processing statistics and insights
  • πŸ”¬ Live Processing: Real-time pipeline visualization
  • πŸ“Š Document Comparison: Compare multiple documents

Quick Start

1. Install Dependencies

# From project root
pip install -r demo/requirements.txt

# Or install all SPARKNET dependencies
pip install -r requirements.txt

2. Start Ollama (Optional, for live processing)

ollama serve

# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text

3. Run the Demo

# From project root
streamlit run demo/app.py

# Or with custom port
streamlit run demo/app.py --server.port 8501

4. Open in Browser

Navigate to http://localhost:8501

Demo Pages

Page Description
Home Overview and feature cards
Document Processing Upload/select documents for OCR processing
Field Extraction Extract structured fields with evidence
RAG Q&A Ask questions about indexed documents
Classification Classify document types
Analytics View processing statistics
Live Processing Watch pipeline in real-time
Interactive RAG Chat-style document Q&A
Document Comparison Compare documents side by side

Sample Documents

The demo uses patent pledge documents from the Dataset/ folder:

  • Apple 11.11.2011.pdf
  • IBM 11.01.2005.pdf
  • Google 08.02.2012.pdf
  • And more...

Screenshots

Home Page

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ”₯ SPARKNET                            β”‚
β”‚  Agentic Document Intelligence Platform β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [Doc Processing] [Extraction] [RAG]    β”‚
β”‚                                         β”‚
β”‚  Feature cards with gradients...        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

RAG Q&A

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ’¬ Ask a question...                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  User: What patents are covered?        β”‚
β”‚                                         β”‚
β”‚  Assistant: Based on the documents...   β”‚
β”‚  [πŸ“š View Sources]                      β”‚
β”‚    [1] Apple - Page 1: "..."            β”‚
β”‚    [2] IBM - Page 2: "..."              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration

Environment Variables

# Ollama URL (default: http://localhost:11434)
export OLLAMA_BASE_URL=http://localhost:11434

# ChromaDB path (default: ./data/vectorstore)
export CHROMA_PERSIST_DIR=./data/vectorstore

Streamlit Config

Create .streamlit/config.toml:

[theme]
primaryColor = "#FF6B6B"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F2F6"
textColor = "#262730"

[server]
maxUploadSize = 50

Development

Adding New Pages

  1. Create a new file in demo/pages/:

    demo/pages/4_πŸ†•_New_Feature.py
    
  2. Follow the naming convention: {order}_{emoji}_{name}.py

  3. Import project modules:

    import sys
    from pathlib import Path
    PROJECT_ROOT = Path(__file__).parent.parent.parent
    sys.path.insert(0, str(PROJECT_ROOT))
    

Customizing Styles

Edit the CSS in app.py:

st.markdown("""
<style>
    .main-header { ... }
    .evidence-box { ... }
</style>
""", unsafe_allow_html=True)

Troubleshooting

"ModuleNotFoundError: No module named 'src'"

Make sure you're running from the project root:

cd /path/to/SPARKNET
streamlit run demo/app.py

Ollama Not Connected

  1. Check if Ollama is running: curl http://localhost:11434/api/tags
  2. Start Ollama: ollama serve

ChromaDB Errors

Install ChromaDB:

pip install chromadb

License

Part of the SPARKNET project. See main LICENSE file.