Spaces:

MHamdan
/

SPARKNET

Sleeping

App Files Files Community

SPARKNET / demo /README.md

MHamdan

Initial commit: SPARKNET framework

d520909 26 days ago

preview code

raw

history blame contribute delete

4.75 kB

	# SPARKNET Demo Application

	An interactive Streamlit demo showcasing SPARKNET's document intelligence capabilities.

	## Features

	- 📄 Document Processing: Upload and process documents with OCR
	- 🔍 Field Extraction: Extract structured data with evidence grounding
	- 💬 RAG Q&A: Interactive question answering with citations
	- 🏷️ Classification: Automatic document type detection
	- 📊 Analytics: Processing statistics and insights
	- 🔬 Live Processing: Real-time pipeline visualization
	- 📊 Document Comparison: Compare multiple documents

	## Quick Start

	### 1. Install Dependencies

	```bash
	# From project root
	pip install -r demo/requirements.txt

	# Or install all SPARKNET dependencies
	pip install -r requirements.txt
	```

	### 2. Start Ollama (Optional, for live processing)

	```bash
	ollama serve

	# Pull required models
	ollama pull llama3.2:3b
	ollama pull nomic-embed-text
	```

	### 3. Run the Demo

	```bash
	# From project root
	streamlit run demo/app.py

	# Or with custom port
	streamlit run demo/app.py --server.port 8501
	```

	### 4. Open in Browser

	Navigate to http://localhost:8501

	## Demo Pages

	\| Page \| Description \|
	\|------\|-------------\|
	\| Home \| Overview and feature cards \|
	\| Document Processing \| Upload/select documents for OCR processing \|
	\| Field Extraction \| Extract structured fields with evidence \|
	\| RAG Q&A \| Ask questions about indexed documents \|
	\| Classification \| Classify document types \|
	\| Analytics \| View processing statistics \|
	\| Live Processing \| Watch pipeline in real-time \|
	\| Interactive RAG \| Chat-style document Q&A \|
	\| Document Comparison \| Compare documents side by side \|

	## Sample Documents

	The demo uses patent pledge documents from the `Dataset/` folder:

	- Apple 11.11.2011.pdf
	- IBM 11.01.2005.pdf
	- Google 08.02.2012.pdf
	- And more...

	## Screenshots

	### Home Page
	```
	┌─────────────────────────────────────────┐
	│ 🔥 SPARKNET │
	│ Agentic Document Intelligence Platform │
	├─────────────────────────────────────────┤
	│ [Doc Processing] [Extraction] [RAG] │
	│ │
	│ Feature cards with gradients... │
	└─────────────────────────────────────────┘
	```

	### RAG Q&A
	```
	┌─────────────────────────────────────────┐
	│ 💬 Ask a question... │
	├─────────────────────────────────────────┤
	│ User: What patents are covered? │
	│ │
	│ Assistant: Based on the documents... │
	│ [📚 View Sources] │
	│ [1] Apple - Page 1: "..." │
	│ [2] IBM - Page 2: "..." │
	└─────────────────────────────────────────┘
	```

	## Configuration

	### Environment Variables

	```bash
	# Ollama URL (default: http://localhost:11434)
	export OLLAMA_BASE_URL=http://localhost:11434

	# ChromaDB path (default: ./data/vectorstore)
	export CHROMA_PERSIST_DIR=./data/vectorstore
	```

	### Streamlit Config

	Create `.streamlit/config.toml`:

	```toml
	[theme]
	primaryColor = "#FF6B6B"
	backgroundColor = "#FFFFFF"
	secondaryBackgroundColor = "#F0F2F6"
	textColor = "#262730"

	[server]
	maxUploadSize = 50
	```

	## Development

	### Adding New Pages

	1. Create a new file in `demo/pages/`:
	```
	demo/pages/4_🆕_New_Feature.py
	```

	2. Follow the naming convention: `{order}_{emoji}_{name}.py`

	3. Import project modules:
	```python
	import sys
	from pathlib import Path
	PROJECT_ROOT = Path(__file__).parent.parent.parent
	sys.path.insert(0, str(PROJECT_ROOT))
	```

	### Customizing Styles

	Edit the CSS in `app.py`:

	```python
	st.markdown("""
	<style>
	.main-header { ... }
	.evidence-box { ... }
	</style>
	""", unsafe_allow_html=True)
	```

	## Troubleshooting

	### "ModuleNotFoundError: No module named 'src'"

	Make sure you're running from the project root:
	```bash
	cd /path/to/SPARKNET
	streamlit run demo/app.py
	```

	### Ollama Not Connected

	1. Check if Ollama is running: `curl http://localhost:11434/api/tags`
	2. Start Ollama: `ollama serve`

	### ChromaDB Errors

	Install ChromaDB:
	```bash
	pip install chromadb
	```

	## License

	Part of the SPARKNET project. See main LICENSE file.